What’s new in Sysdig – October 2020

By Chris Kranz - OCTOBER 22, 2020

SHARE:

Facebook logo LinkedIn logo X (formerly Twitter) logo
Whats new in Sysdig - October 2020

Welcome to another monthly update on what’s new from Sysdig!

CloudTrail support

This month, our big announcement was around CloudTrail and Fargate scanning support. CloudTrail support gives Sysdig Secure the ability to ingest CloudTrail events. These get fed into the runtime security engine, where rules can be created using the Falco rules language. This extends the reach of Sysdig Secure for our customers to provide runtime security detection not only for containers, processes, or Kubernetes runtime, but also for AWS Cloud. CloudTrail has support for over 170 AWS services, which gives Sysdig the capability to detect activity in any of these. Our initial examples include actions like creating a privileged user in IAM, creating a load balancer without TLS, and creating a public S3 bucket.

Fargate scanning support

Fargate image scanning fits directly into your DevOps pipelines and workflows. Sysdig can be used in a couple of ways to provide security scanning of Fargate workloads. The automated Fargate image scanning is new here, but it’s worth reiterating all the ways we provide coverage for AWS workloads:

  1. Integrate into your CI/CD pipelines, including AWS CodePipeline and AWS CodeBuild, to automatically scan images as they are being built and before they are pushed to a registry.
  2. Integrate into ECR to automatically scan any image that was pushed to a registry (maybe manually outside of a build pipeline).
  3. And now, integrate directly with Fargate to detect a new job and automatically scan the image, even if it came from an external registry.

By the way, all these scanning jobs can be performed locally, so your images never leave your AWS account and we won’t ask you to expose access.

Increased security and saving on egress costs!

You can find all of our AWS integrations in our partner’s page, because there’s more than just the new stuff we launched this last month!

Other product updates

As always, please check out our own Release Notes for more details on product updates, and ping your local Sysdig contact if you have any questions about anything covered here.

Sysdig Secure

Event Forwarding: Kafka and Webhook Added

Two new supported integrations have been added to the Sysdig Secure Event Forwarder:

The Kafka topic integration includes support for:

  • Multiple Kafka brokers
  • Partitioner/Balancer algorithms: Murmur2, Round robin, Least bytes, Hash, CRC32
  • Compression algorithms: LZ4, Snappy, Gzip, Zstandard

The Webhook integration includes support for:

  • Authentication methods: Basic authentication, Bearer Token, and Signature Header
  • Custom headers defined by the user to accommodate any additional parameter required on the receiving end.
How customers are using this

This request was from customers looking to aggregate security events from multiple sources into a centralized platform. Sometimes, this was to allow for a centralized SIEM for visibility and management, but we also have several customers doing this for analytics purposes. The Kafka streams can be used to store security events in a big data / data warehouse platform for further analytics.

Vulnerability Exceptions Handling Enhanced

The Vulnerability Exceptions feature in Sysdig Secure has been redesigned and enhanced.

It now offers:

  • Additional vulnerability and feed context.
  • Precise mapping between images and their associated exceptions.
  • A better exception management lifecycle.
  • Multiple vulnerability lists, which can be flexibly assigned to different image sets (or just a particular image) by using the scanning policy assignments.
  • Additional information displayed to improve team awareness and security context:
    • Vulnerability description.
    • User-defined notes.
    • Vulnerability feed info, with severities and links as provided per feed.
  • Configurable expiration dates:
    • An exception is automatically disabled when the expiration date is met.
    • Day resolution, all times relative to 0:00 UTC.
  • Enhanced workflow integration with the “Scan results” page for an individual image, with the ability to quickly append a flagged vulnerability to a list.

Migration: The exception and evaluation behavior in the current environment will be maintained after the feature upgrade. In particular:

  • Pre-existing vulnerability exceptions will be migrated to the “Default exceptions list”
  • The “Default exceptions list” will be assigned to every pre-existing policy assignment
  • All the pre-existing vulnerability exceptions expiration date will be set to “Never.”

See also: Manage Vulnerability Exceptions and Global Lists.

How customers are using this

Some of our customers gave us feedback that they wanted to allow temporary exceptions, maybe while they evaluated a vulnerability or looked into different ways of mitigating it. Many vulnerabilities we’ve seen come from unused or retired libraries that aren’t used, so our customers wanted the ability to put in a temporary exclusion until the next sprint when an updated image was pushed out.

Many customers have more permanent exception requirements, often because after analyzing a specific vulnerability, they have implemented additional mitigation controls and have determined the potential threat is negligible. Just putting in an exception doesn’t help when being audited, so having the ability to add notes to an exception is very important, and means everyone can understand what has been done and why a particular exclusion rule has been implemented.

And of course different business units have different rules. Many of our customers work in financial services, so while certain infrastructure workloads might be acceptable to have exceptions added, a different set of rules is often required for applications handling financial data, card payment services, etc. The flexibility of creating different rules to apply to different areas of the business is very important, and as always with Sysdig, you can use the native labels that already exist in Kubernetes, in container image repositories, or in the containers and hosts themselves.

AWS Threat Detection using CloudTrail and Sysdig Secure

Sysdig is happy to announce the general availability of a CloudFormation Template that will deploy a cloud-native operational security engine. By leveraging AWS CloudTrail and the Falco language, you can detect any unexpected or unwanted behavior in your AWS accounts.

Sysdig Cloud Connector leverages AWS CloudTrail as the source of truth for enabling governance, compliance, operational auditing, and risk auditing for your AWS account.

Every API action over your infrastructure resources is recorded as a set of CloudTrail entries. Once the integration is deployed in your infrastructure, the Sysdig Cloud Connector can analyze these entries in real-time and provide AWS threat detection by filtering them against a flexible set of security rules.

Example detection rules included in this release:

  • Attach a user to an Administrator Policy
  • Create an HTTP Target Group without SSL
  • Deactivate MFA for user access
  • Delete S3 bucket encryption

Sysdig Cloud Connector provides several notification options, including sending security findings to AWS CloudWatch and AWS Security Hub. When configured, you can consume the security events without leaving your cloud console.

See also: https://sysdiglabs.github.io/cloud-connector/.

How customers are using this

This is a big extension of the existing Secure Runtime capabilities, and while it’s still very new, we’re already seeing customers using this to get visibility into their AWS accounts. Specifically, this is being used for some of the audit requirements where they need a clear record of all IAM users / roles that have administrator privileges attached, or to validate that all load balancers are using TLS. We’re being told this is hugely useful in checking some of the compliance requirement boxes.

Automated Fargate Image Scanning

Sysdig is pleased to announce the general availability of a new integration, leveraging the Sysdig Inline Scanning capabilities to automatically analyze the base images used for any task created using AWS Elastic Container Service (ECS or Fargate).

  • Straightforward deployment using a CloudFormation template – the only mandatory parameter is the Sysdig API token.
  • Inline scanning living inside your AWS account means improved security:
    • No need to expose or configure private AWS registries.
    • Only image metadata is sent to Sysdig Secure, not the actual image contents.
    • No sensitive information ever leaves your AWS account.
    • An ephemeral task will be spawned to analyze each discovered image in parallel.
  • Each time you deploy a new task in AWS ECS/Fargate, an EventBridge event will be triggered and a lambda function will parse which images need to be analyzed by the CodeBuild pipeline job.
    • Fully automated.
    • Scan results and scanning policies are still controlled from a single security governance point using Sysdig Secure.
How customers are using this

Several of our customers had been looking forward to this feature and have jumped straight into using it. When combined with our ECR and CodePipeline integrations, this extends coverage to the most common threat models for workloads running in Fargate. Regardless of where a container has come from, Sysdig has this covered.

We hear this is also great for our larger environments where there are multiple accounts, or some accounts have more free access so can pull images from a variety of external sources. This provides the final gate of protection to make sure everything that is run in Fargate has been scanned, even if it came from an outside or alternative repo. Several of our customers allow their users to choose what repo and build pipeline technology they prefer (or are dealing with multiple technologies from company mergers and acquisitions). This allows them to simply provide a final gate, so even if they haven’t integrated Sysdig into all the various build pipelines or repositories, or just need to support unmanaged repositories, they can still get image scanning protection once the containers are being run in their environment.

Regulatory Compliance Validation Engine

This is the first release for the regulatory compliance validation engine. Included in this release are the PCI-DSS control checks for which Sysdig has coverage using features such as image scanning, Falco rules, runtime policies, captures, topology maps, and more. This release allows us to expand further into other regulatory compliance frameworks in the future, but this initial release focuses on PCI-DSS.

This new feature maps the various compliance controls against the coverage that Sysdig provides, and where coverage is not currently being met, guidance is provided to show how to provide coverage. The report is designed to allow quick visibility into the current state of compliance, and can be used by auditors to validate the controls in place.

This feature will be fully released during the week of Oct. 12, and is currently under limited release. If you’re reading this and don’t see this enabled, but would like to test this, please reach out to your Sysdig contacts.

How customers are using this

We’ve heard from our financial services customers that this is an extremely useful feature, and they’re using it to track their compliance obligations to make sure they have the coverage they thought they had. Customers have told us that the need to have a continual up-to-date view of their current compliance posture and exposure is essential for their business, and is also something that auditors are starting to ask for. Several organizations are now implementing randomized internal audits, so the compliance dashboard is proving to be a very useful tool for them to prove compliance of their containerized environments.

Falco Rules Updates

The latest version of the Falco rules is Sysdig 0.10.0.

New tags have been added to simplify finding relevant rules for your requirements and use-cases:

  • SOC2
  • NIST 800-53
  • PCI-DSS
  • NIST 800-190

Rule changes (diff between 0.10.0, which is the latest at time of writing, and 0.8.3, which was covered last month).

  • All Rules: Add user.loginuid as an output field. This uid is generally unchanging across sudo/su commands, and can more reliably identify users.
  • Write below root: Events will not be triggered if the process name is missing.
  • Delete or rename shell history: Ignore docker programs that would prevent modifying shell history, when the path is expressed within the container filesystem (/.bash_history) and host filesystem (/var/lib/docker/overlay/.../.bash_history).
  • Launch Sensitive Mount Container: Change image matching to correctly identify Sysdig images as compared to names starting with “sysdig…”
  • Detect shell history deletion: Ignore paths below /var/lib/docker. For example, the container filesystem overlay images that are removed when a container is removed.
  • Packet socket created in container: Now enabled by default.
  • Launch Privileged Container: Add additional images that can run with privileged=true.
  • Launch Sensitive Mount Container: Fix a typo that allows docker.io/sysdig/agent-slim to perform sensitive mounts.
  • Read sensitive file untrusted: Allow linux-bench to read sensitive files containing user information.
  • Update Package Repository: Restrict checks to files below known package management directories.
  • Write below etc: Add exceptions related to calico within containers.
  • Write below root: Allow mysqlsh write to /root/.mysqlsh .
  • Read sensitive file untrusted: Allow google_oslogin_{control} read sensitive files.
  • Change thread namespace: Trigger only when the process name is known.
  • Create HostNetwork Pod: Allow several images related to GKE + default metrics/routing services run with hostnetwork=true.
  • Disallowed Kubernetes User: Add several known Kubernetes users to the allowed list.
  • Pod Created in Kube Namespace: Allow several images related to GKE + default metrics/routing services run in kube-system/kube-public namespaces.
  • System ClusterRole Modified/Deleted: Allow modifications to the role system:managed-certificate-controller.

Sysdig Monitor

Prometheus native Service Discovery

Service discovery is how you find which endpoints to scrape for metrics inside your cluster or environment. As of agent v10.5.0, Sysdig supports the native Prometheus service discovery. You can add in the Sysdig prometheus.yaml file any configuration, similar to what you would do in a vanilla Prometheus. We call this Promscrape v2.

Until now, the Sysdig agent would auto discover which endpoints to scrape using the process_filter rules in dragent.yaml under the prometheus: section.

Now, when Prometheus native Service Discovery is enabled in dragent.yaml, with prom_service_discovery: true under the prometheus: section, the new version of Promscrape will use the configured prometheus.yaml to find which endpoints to scrape.

The default prometheus.yaml can be found in the Sysdig agent container and in our docs site. If you want to customize it, we recommend that you mount it through an external volume, typically a Kubernetes ConfigMap.

Promscrape v2 supports all types of scraping configuration to support advanced use cases, such as collecting metrics from existing Prometheus servers through Federation, or multi-target exporter pattern, typically used by the blackbox-exporter.

Full configuration options and details can be found in the Sysdig documentation.

How customers are using this

We want to make the experience of using Prometheus with Sysdig as easy as possible while maintaining all the native compatibility. This configuration approach allows you to maintain the same setup you would use in a vanilla Prometheus environment, being Federation and multi-target endpoints the main use cases.

We have heard from several customers that have applications that are emitting thousands of time series that they just don’t need. Being able to easily create rules to drop unnecessary metrics makes it easier to focus on the important metrics, and improves their ability to scale.

Relabeling also allows to ingest Prometheus metrics without asking teams to change how they expose metrics in their applications or exporters. With re-labeling, they can introduce changes to bring parity across different environments or business units who might be using different labeling conventions for similar metrics. This makes it easier to compare them.

Minor Updates

  • Preserve shared dashboards on user deletion (previously these would be lost).
  • Time navigation in Events to allow you to quickly jump back in time to understand historically what happened in your infrastructure.
  • “Zoom out” button added to dashboards time navigation to allow the amount of time scoped for the dashboard to be quickly extended (i.e., jump from one hour to two hours to four hours, etc.).

Sysdig Agents

Sysdig Agent

The latest Sysdig Agent release is 10.5.1. Below is a diff of updates since 10.4.1 which we covered in our last update. There are some important fixes in this agent release, and we highly recommend upgrading to 10.5.x as soon as you can. If you aren’t using :latest, simply edit your daemonset, reapply this, and restart the pods.

New Features & Enhancements
  • Enable communication between agent and collector through an HTTP(s) proxy server.
  • Default Prometheus configuration file – see the notes above in Sysdig Monitor about Promscrape v2.
  • Added new rules to the Prometheus configuration to honor pod annotations.
  • Secure mode – Sysdig agent now supports secure mode that offers Secure only features. See Secure Mode for more information.
  • Improved reconnection logic if the Agent loses connection (or fails to initially connect) with the Collector.
Fixes
  • Addressed vulnerabilities reported in the agent and agent-slim containers, including the one for CVE-2017-18640 in a dependency library related to image scanning.
  • Prometheus metrics can now be scraped from endpoints in Docker containers with remapped port numbers.
  • Prevent agent crashes in large systems – the agent now starts faster on systems with thousands of processes and hundreds of containers.
  • Warning for Prometheus metric limit – the agent logs a warning once a minute when the Prometheus metric limit is reached.
  • Transmitting Prometheus metrics works as expected when service discovery is enabled.
  • Appcheck metrics no longer go missing – fixed a problem that would cause certain appcheck metrics to be missing when 10-second aggregation in the agent is enabled.
  • Agent now times out if connection attempts to the Collector don’t work.
  • Agent now collects JMX metrics from new process following a Java service restart.
  • Pod to Service connection – fixed a problem that caused the UI to show a pod under an incorrect service if other services exist in different namespaces with the same selectors. This happened when the thin_cointerface_enabled property was set to true.
  • Syscall fast rule triggers as expected – fixed the evaluation of secure fast engine syscall rules when the If Not Matching rule is selected.
  • Pods are no longer associated with incorrect deployments – fixed a problem that could cause a pod to be associated with incorrect deployments.

Helm Chart

The Helm Chart 1.10.1 has been released to support Agent 10.5.1. As mentioned above, this is a recommended upgrade.

https://charts.sysdig.com/

Node Image Analyzer

Version 0.1.3 and 0.1.4 were released this month. Most of the recent fixes were internal, making it more efficient and cleaning up some legacy code and libraries, as well as expanding support for Google COS.

If you haven’t used the node image analyzer yet, we definitely recommend it! The node image analyzer provides the capability to scan images as soon as they start running on hosts where the analyzer is installed. It is typically installed alongside the Sysdig agent container. This component was introduced to reduce dependencies on analyzing images within the Sysdig backend (SaaS or On-prem). Some advantages include:

  • Sharing credentials with the Sysdig backend in order to pull images is not required.
  • Sharing the image content, and potentially code, with the Sysdig backend is not required; only metadata will be sent out.
  • Opening a network route to allow the Sysdig backend to reach the user’s registries is not required.
  • Reduced egress bandwidth costs.

If the node analyzer is installed, there is no longer any need to manually trigger running image scans.

Node image analyzer can be installed as part of the Sysdig Agent install.

Inline Scanning Engine

1.0.4 has been released this month. The following changes were made:

  • Reducing the image size (down to 200MB), which has the knock-on effect of improving the first run latency.
  • More robust and reliable data transfer between the analyzer and the scanning backend.
  • Minor bug fixes reported by customers concerning shell string and parameter processing.

SDK, CLI and Tools

Sysdig CLI

v0.4.6, v0.5.0, and v0.6.1 were released and include the following updates:

New Features
  • Remove inline scanning option: sdc-cli scanning image inline-scan is no longer available. If you need to use Inline Scanning, please check out the Inline Scanning Engine.
  • Update sdcclient to 0.13.0 with support for Dashboards v3.
  • This makes the use of client_v3 unrequired and therefore it’s being removed. This does not affect users of the CLI.
  • Add vulnerability detail endpoint and exception management with the following new endpoints:
    sdc-cli scanning vulnerability add_exception
    sdc-cli scanning vulnerability del_exception
    sdc-cli scanning vulnerability get_info
    sdc-cli scanning vulnerability bundle list
    sdc-cli scanning vulnerability bundle add
    sdc-cli scanning vulnerability bundle del
    sdc-cli scanning vulnerability bundle get
    
Bug Fixes
  • While updating the client, there were some issues found where dashboards v2 were using the v3 endpoint, so that was not backwards compatible; this has been solved by splitting the use cases into v2 and v3, and testing them independently.
  • This release also addresses the API change where the username format in the dashboards is now “FirstName LastName (Email)” instead of just “Email” and dashboards being restored without –all-users were being ignored since they weren’t being recognized as owned by the user.
  • Solve KeyError if restoring an old version of the dashboards.
  • Update dependencies to work with Python>=3.8.4.

https://github.com/sysdiglabs/sysdig-platform-cli/releases/tag/v0.5.0 https://sysdiglabs.github.io/sysdig-platform-cli/

Python SDK

v0.12.0 and v0.13.0 were released and includes the following updates:

Bug Fixes
  • examples: Updating a map returns null
  • examples: Do not append ‘updated via’ in restore_alerts example
  • Add append field to falco macro and list creation
  • Falco rule version comparison
  • json.dumps with map
  • List events can receive a name to filter
  • List events was not using v2 API
  • Strip / characters from the URL
  • The Policy API has changed origins and still requires Secure UI
Added Features
  • Add support for access keys
  • Add vulnerability detail method for the SdScanningClient
    • get_vulnerability_details
  • Add vulnerability exception methods for the SdScanningClient
    • add_vulnerability_exception_bundle
    • delete_vulnerability_exception_bundle
    • list_vulnerability_exception_bundles
    • get_vulnerability_exception_bundle
    • add_vulnerability_exception
    • delete_vulnerability_exception
    • update_vulnerability_exception
BREAKING CHANGES
  • Add support for Dashboards v3
    • Splits the SdMonitorClient into a DashboardClient‘s, both from version v2 and v3. By default, SdMonitorClient now inherits from DashboardClientV3 and therefore implements all of the interface, but the following methods have changed the signature:
      • add_dashboard_panel
      • remove_dashboard_panel

        This is required because the Dashboards V3 does not implement the same kind of panels that V2 implemented.

https://github.com/sysdiglabs/sysdig-sdk-python/releases/tag/v0.13.0

Terraform Provider

v0.5.0 was released and includes the following updates:

  • Removed Resource: The sysdig_secure_notification_channel was marked as deprecated in the latest version, and this one removes it.
  • Terraform SDK has been updated to v2, and now it makes use of the Go’s context for cancellation, which improves reliability when cancelling a terraform apply.
  • New Resource: sysdig_monitor_dashboard that implements Dashboards v3 with PromQL support.
  • New Datasource: sysdig_current_user. Retrieves information from the current user performing API calls.
  • Admins will be ignored from the team creation with sysdig_monitor_team and sysdig_secure_team since they are added by default.

Monitor Documentation
Secure Documentation

Promcat Resources

Just a reminder, Promcat.io is a curated set of Prometheus exporters that Sysdig will provide full support for to our customers. It is publicly available, so non-customers can still make use of it, although we won’t be able to offer official support.

The following new Prometheus exporters have been added to the website:

  • Cassandra
  • AWS EKS with Fargate
  • Oracle DB
  • CoreDNS (as integration and as part of Control Plane)
  • CEPH
  • AWS SQS

We also made the following enhancements:

  • Added ECS on EC2 in Fargate
  • Add scope variables in Sysdig dashboards
  • Update AWS resources to Yace v0.18
  • Update helmfile with etcd secrets
  • Fixed a bug in OpenShift recording rules

Deprecation Notices

We are going to remove the action to scan an unscanned image from the image scanning alerts. The alternative methods (including inline scanning and node image analyzer) provide a faster and more secure method for scanning any unscanned images.

New Website Resources

Blogs

Webinars

Case Studies

Subscribe and get the latest updates