How to deal with ransomware on Azure

Let’s dig deeper into the techniques used by attackers and the mitigations you should implement when ransomware on Azure affects you.

By now, we should all be aware of ransomware from the constant news articles associated with this known threat. As we explained in the anatomy of a cloud attacks, ransomware is a way for attackers to make money when they gain control of your accounts through data encryption, therefore restricting your access to the system.

This blog will cover the ways we can mitigate the approaches that attackers use to spread ransomware on the Azure environment.

Azure threat research matrix

We will use the available cloud-native toolset of Microsoft Azure to set a series of best practices that should limit the blast radius of a user who potentially gained access to the cloud environment, and ultimately aim to stop the attack as early as possible in the kill chain.

If familiar with the Azure Threat Research Matrix, an alternative to the MITRE ATT&CK framework, Microsoft no longer sees “Initial Access” as the first point in the Azure threat detection lifecycle.

The Azure threat detection matrix starts with the reconnaissance layer, where we need to be focused. This is a classic approach in a pentesting activity. The attacker tries to get as much information as it can, passively or actively (i.e., IP discovery or port scanning of publicly-accessible resources). In this Prometheus exposure talk, we explained how to use monitoring tools to fingerprint the entire Kubernetes cluster, thereby closing the door to attackers.

We won’t focus on Kubernetes security in-depth within this blog post. If interested in learning more about the various attack vectors in Kubernetes, we recommend checking-out Kubernetes GOAT for real-world labs designed to secure any K8s cluster. For an introduction into Kubernetes security, you can check-out Sysdig’s ‘Learn Cloud Native’ blog.

From an exposure perspective in the cloud, the attacker can gain access in a multitude of ways.

There could be a compromised user account.
A rogue user may have left an organization with existing full or shared permissions.
There could be a simple misconfiguration identified in your public-facing application or service.
An incorrectly configured piece of code could open up a vulnerability within your applications.

Ransomware on Azure — Screenshot: Azure Threat Detection Matrix

Ransomware is the result of a series of failures in threat detection in the cloud. It’s not the initial problem you have, so it is necessary to evaluate and detect each of the techniques to be able to prevent this type of attack. In this case, it’s ransomware on Azure.

We need to apply a series of Azure best practices to implement detection of abnormal behavior detection in runtime, as well as earlier in the application lifecycle pipeline. We need to ensure no misconfigurations exist in our Infrastructure-as-Code (IaC) templates via CI/CD pipeline image scanning, as well as build guardrails, such as limited-trust user credentials, that are provided on a per-user basis. We will try to address each of these best practices below.

Before we start looking at Azure threat detection tools, such as Microsoft Defender and Azure Sentinel, it’s important to set up good hygiene practices.

Ensuring users have unique credentials (not credential sharing).
Multi-Factor Authentication (MFA) is enforced.
User permissions are limited following the least-privilege principle.
Limit which connections are allowed in-and-out of our Azure environment.

Let’s walk through Azure’s threat detection matrix.

Reconnaissance

Tools like Nmap, Metasploit, SQLMap, and many others don’t have any real capability of scanning at the Azure Infrastructure level. Other tools like ScoutSuite or aws-extended-cli are focused on cloud environments and could help to enable your security posture assessment.

So how exactly do they find out which hosts are running the public-facing application, or how do they identify existing vulnerabilities?

IP Discovery

Public IP addresses allow Internet resources to communicate inbound to Azure resources. It’s easy enough to view an IP address on a resource by viewing the Virtual Network Interface.
Simply run az network public-ip show in the Azure CLI.

Public IP addresses are created with certain Pricing Units (SKU’s) in Microsoft Azure. Under the ‘Basic’ offering, the public IP addresses security is open by default. Network Security Groups (NSGs) are recommended but optional for restricting inbound or outbound traffic. Failure to configure these NSGs opens up our cluster to potential adversaries who plan to spread ransomware.

NSGs in Azure is a standardized way of activating a rule or Access Control List (ACL), which will allow or deny network traffic to your virtual machine instances in a virtual network. NSGs can be associated with subnets or individual virtual machine instances within that subnet.

The standard SKU offers a “secure by default” model, which means it should be closed to inbound traffic when used as a frontend. This is an improvement, but we must consider the “least privilege” concept of allowing on the traffic inbound/outbound that’s required for our applications to operate, and limit all other traffic via NSGs.

Port Mapping

It is possible to view the open ports on a virtual machine by viewing the Virtual Network Interface’s assigned NSGs. This is a similar process, as we can run ‘az network nsg show‘ from the Azure CLI.

When we think about the Virtual IPs of the host node, basic infrastructure services like DHCP, DNS, IMDS, and health monitoring are provided through the virtualized host IP addresses 168.63.129.16 and 169.254.169.254. These IP addresses are the only virtualized IP addresses used in all regions. In fact, the link local address ‘169.254.169.254’ is being used by AWS, GCP, and Azure.

By default, these services aren’t subject to the configured NSGs unless targeted by service tags specific to each service. To override this basic infrastructure communication, you can create a security rule to deny traffic by using the following service tags on your NSG rules:

AzurePlatformDNS
AzurePlatformIMDS
AzurePlatformLKM

The properties of an NSG rule can be seen below:

We also need to reduce the exposure associated with an attack by limiting the uptime (total time) that a specific port is open on your host Virtual Machine (VM). Realistically speaking, in the case of port scanning attacks, we only need ports to stay open for a very short amount of time – just enough to achieve our basic maintenance tasks. Just-In-Time VM Access can be used to limit the amount of time that these ports stay open on your Azure VMs.

It leverages NSG rules to enforce a secure configuration and access pattern.

Initial access

One of the most common attack methods has been via Remote Desktop Protocol (RDP) to access Windows hosts, or Secure Shell (SSH) protocol for Linux systems. We wrote an earlier blog post on securing SSH connections for EC2 instances on AWS, but a lot of these considerations should also be applied to Microsoft Azure VM.

Although there are similarities on how both are used for remotely accessing a host endpoint, there are some clear design differences between them. In regards to design, SSH does not require additional tools such as a Virtual Private Network (VPN) or Multi-factor authentication (MFA) to access them, whereas RDP requires both. In either case, the main problem with these access methods is that they are based on how you manage your credentials, how you expose the endpoint, and how many layers of security you can add, even if it affects the user experience.

Valid credentials

As stated above, adversaries or disgruntled former employees could be able to log into Azure AD from outside the organization using valid credentials. By logging in with valid credentials to an account or service principal, the adversary will assume all privileges of that account or service principal. If the account is privileged, this may lead to other tactics, such as persistence or privilege escalation.

Active Directory protocols are widely known. It is also worth noting that it has remained the prime target of ransomware attacks. While organizations continue to shift to advanced cloud applications, such as Azure AD, with the rising need for security and better data management, there are some gaps worth focusing on.

Prevent brute-force attacks on RDP instances

To secure your RDP instance, we strongly recommend configuring an account lockout threshold. If no threshold was configured, the attacker could “brute-force” the instance until they successfully take over the account. The attacker realistically needs to use a form of automation tool to submit many passwords in a row until the right one is “guessed.” This wouldn’t be possible by an individual user.

Setting the failed login attempts to a threshold like 10 attempts in 10 minutes should prevent this kind of attack vector from working. Microsoft Azure implements a tool called Identity Protection natively within Azure AD, which is designed to identify potential risky behavior surrounding authentication events. Users with an Azure AD Premium P2 license may follow remediation steps to check for suspicious activity and take action to prevent a malicious account takeover.

Password spray attacks on Single Sign On (SSO)

Password spraying is a type of brute force attack. In this attack, a bad actor will brute force logins based on a list of usernames with default passwords on the application.

For example, an attacker will use one password (e.g., Sysdig@123) against many different accounts on the application to avoid account lockouts that would normally occur when brute forcing a single account with many passwords.

In the case of SSO users, they can automatically sign into Azure AD without entering a password. This is nice, as it takes away from manually reusing weak passwords. However, the protocol used for SSO is flawed since it allows attackers to perform single factor brute force attacks. Although this flaw is not limited to SSO, your passwords, therefore, need to be strong.

Microsoft emphasizes using multi-factor authentication (MFA) as an alternative advanced threat prevention solution when logging in. Although MFA cannot wholly inhibit password breaches, it’s a good starting point. For instance, when MFA is deployed, company support providers cannot reach out through Remote PowerShell (RPS). Therefore, employees typically resort to easy access through an admin account created in Office 365. These accounts are what bad guys are looking for; they have no MFA security.

This case also happened in the high-profile Deloitte breach. Bad actors compromised the global email server of the company using an elevated admin account that had no “two-step” verification.

Execution

Microsoft has built its Azure AD solution in a way that prevents bad guys from moving laterally or stealing our data.

This is great because the attacker wants to move inside the cloud account, looking for where the data is located to succeed in ransomware on Azure platform.

Microsoft ensures granular security controls are implemented within Azure AD with its default security solutions, however, the attack paths have weaknesses that didn’t exist in Microsoft AD design.

Organizational Unit (OU) lack of support

Azure AD does not support organizational units, unlike AD. Managing users and devices through groups is the easiest way of monitoring access within large organizations. Therefore, a lack of OU support leads to a greater administration load on IT teams. IT teams often overlook redundant activities, and security gets compromised.

Group Policy Object (GPO) lack of support

It’s also worth noting that there are lesser controls on the devices we connect to the Network. Unlike traditional Microsoft AD, Group policies are also not supported in Azure AD. They help administrators manage all devices on the network. It is almost impossible to manage device settings in large organizations without GPO.

Once the malware is downloaded, it scans the local and network systems for files to be encrypted. Without Group Policy, we risk seeing lateral movement attacks such as ransomware with limited network controls to prevent that movement.

Without Compromised credentials, simply logging in can be the easiest way for a cyber attacker to find their way into a target organization and drop a malicious payload. Next, we will look at how these privilege escalation attempts can occur.

Privilege escalation

An adversary may escalate their privileges if their current account is eligible for role activation via Privileged Identity Management (PIM). The PIM service was designed for Azure AD, and it enables you to manage, control, and monitor access to important resources in your organization. It’s also worth looking at how we can use Role Based Access Controls (RBAC) to prevent access to specific services and privilege escalation within those services.

Privileged Identity Management (PIM)

By minimizing the number of people who have access to secure information or resources, we ultimately reduce the chance of a malicious actor getting access to our environment in the first place, as well as prevent an authorized user from inadvertently impacting a sensitive resource with MS Azure.

PIM works as a kind of Privileged Access Management (PAM) solution for Azure AD. It provides a full audit trail of privileged user activities, session recording, frequent password changes, etc. PIM offers designated users with special access that goes above and beyond what standard users have, hence a need to protect the misuse of privileged accounts.

Role-Based Access Controls (RBAC)

Azure RBAC is the authorization system used to manage access to Azure-specific resources. In Azure AD, for instance, RBAC users have permissions to specific applications/services that are necessary to their role. For instance, an SQL DBA doesn’t necessarily need access to Azure Kubernetes Service (AKS), and certainly doesn’t need full access to create or delete clusters in AKS. Creating different role accounts will have a clear impact on the blast radius of an attacker if they gain access to that user account.

Here are four common roles in Azure AD to better understand how it works:

Read-Only Access Role
This is a strongly recommended user access control for teams looking to implement “limited trust” permissions. Although the attacker can access sensitive information when they compromise this role in Azure AD, it is limited to basic read-only specific actions.
Bad actors now have no elevated permissions to make changes in Azure. They can track new targets and address spaces through vNets, but they can’t elevate permissions on applications, which can be achieved via the owner access role.
Owner Access Role
This role should be used sparsely. The owner in this case has the privilege to edit resources and grant permissions to any resource. Therefore, this role is of great interest to threat actors who wish to give themselves permissions/access to other services within Microsoft Azure. This is a worst-case scenario for ransomware on Azure.
Contributor Role
This role does not allow granting access to other users. It is designed specifically to manage Azure AD resources. Helpdesk teams can use this role in its limited state to allow the uploading of new folders and files. Users in this role cannot remove, move, or copy files, which means they will ensure compliance around “zero-trust” while preventing data loss that could be caused by end-user mistakes.

az role assignment create --assignee "[email protected]" \
--role "Reader" \
--resource-group "technical-marketing"
Code language: JavaScript (javascript)

Assigns the Reader role to the [email protected] user at a ‘technical-marketing’ resource-group scope.

Persistence

Assuming the attacker has already gained access through one of the techniques mentioned before as brute force/password spraying attack and managed to escalate their permissions, adversaries can now modify the rules in a NSG to establish access over additional ports, and execute their lateral movements.

Microsoft Defender for Cloud provides high-quality threat detection and response capabilities, also called Extended Detection and Response (XDR). Aside from using Microsoft Defender for Cloud to monitor for brute-force attempts like password spray, it can also be used to monitor adversaries who disable security, as this is often part of Human Operated Ransomware (HumOR) attack chain.

In a traditional on-premise environment, it’s important that we isolate the endpoints to prevent disk encryption. Endpoint isolation essentially means segregating at-risk computers or other endpoint devices from the rest of the network. Once isolated, security teams can proactively remove the threat, and run remediation and investigation processes. As soon as all of the security issues are addressed, we can proceed to re-add the infected endpoint to the network to mitigate lateral movement or any form of data exfiltration that could happen while an infected device remained on the network.

Since cloud VM’s can be spun-up and torn-down easily, this is not our biggest concern from a persistence point of view. We instead need to prioritize scenarios where the adversary would attempt defense evasion such as Event Log Clearing, especially where they would look to destroy the Security Event log and PowerShell Operational logs needed for intrusion detection and incident response forensics.

Disabling security controls

As stated earlier in privilege escalation, the attacker could also look to disable the security tools/controls associated with some certain NSGs, thereby evading detection.

In general, Azure Defender’s anti-malware protection should protect the endpoints, email servers, and network automatically, as it is designed to mitigate known ransomware. There may be cases, however, where newer ransomware variants are able to bypass such protections and successfully infect target systems. Falco can help us alert on privilege escalation attempts on the host VM or within containerized workloads before the attacker is able to disable those security controls.

In Linux, some binaries and commands can be used by non-root users to escalate root access privileges if the Set User ID (SUID) and Set Group ID (SGID) bit is enabled. There are a large number of executable commands that can allow privilege escalation.

GTFOBins is a curated list of Unix binaries that can be used to bypass local security restrictions in misconfigured systems. In this specific use-case, we will focus on a Falco rule that can detect changes on the SUID and GUID bits (but you should also consider creating rules around those other exec commands as a best practice).

- rule: Set Setuid or Setgid bit
  desc: >
    When the setuid or setgid bits are set for an application,
    this means that the application will run with the privileges of the owning user or group respectively.
    Detect setuid or setgid bits set via chmod
  condition: >
    consider_all_chmods and chmod and (evt.arg.mode contains "S_ISUID" or evt.arg.mode contains "S_ISGID")
    and not proc.name in (user_known_chmod_applications)
    and not exe_running_docker_save
    and not user_known_set_setuid_or_setgid_bit_conditions
  output: >
    Setuid or setgid bit is set via chmod (fd=%evt.arg.fd filename=%evt.arg.filename mode=%evt.arg.mode user=%user.name user_loginuid=%user.loginuid process=%proc.name
    command=%proc.cmdline container_id=%container.id container_name=%container.name image=%container.image.repository:%container.image.tag)
  Priority:
    NOTICE
  tags: [process, mitre_persistence]

Code language: Perl (perl)

This sort of behavior is present in both files and directories. By changing the SUID and SGID bit in a file or directory, other users can run them using the same access permissions of the owner of the file (i.e., root), enabling a regular user to edit the password file which is owned by the system admin.

As a result, it’s critical that we implement good user access hygiene, including MFA, strong passwords, limited trust rules, and additional forensic tools like Falco to identify cases where users are performing unusual forced login attempts or trying to escalate privileges.

The goal is to prevent these actions at the earliest stage, before they escalate and eventually culminate in ransomware on Azure.

Spreading ransomware via rootkit

A rootkit is a collection of malicious computer software created to gain access to a target computer and often hides its existence or the existence of other software. The term rootkit is a concatenation of “root” (the privileged account on Unix-like operating systems) and “kit” (which refers to the software components that implement the tool).

Many users are probably already familiar with using open source Falco to detect attempts to run a workload or service as root. However, it’s best for us in this blog to focus on more ransomware-specific behaviors. In many cases, rootkits will write non-device files within the /dev directory to evade detection and ensure persistence.

It’s really helpful to build Falco rules to detect this kind of behavior:

- rule: create_files_below_dev
  desc: creating any files below /dev other than known programs that manage devices. Some rootkits hide files in /dev.
  condition: (evt.type = creat or evt.arg.flags contains O_CREAT) and proc.name != blkid and fd.directory = /dev and fd.name != /dev/null
  output: "File created below /dev by untrusted program (user=%user.name command=%proc.cmdline file=%fd.name)"
  priority: WARNING
Code language: Perl (perl)

Exfiltration

Adversaries will attempt to exfiltrate/steal sensitive company data by sending it to a Command & Control (C2) server.

If we prevent connections to those known bad IPs/domain names associated with blacklisted C2 servers, we can prevent data loss specifically to those destinations.

Threat feeds

Falco is an open source solution that can be used to detect unusual outbound connections to those known malicious IPs. To complete this step, we will write this rule to a file under /etc/falco/malicious_ips_rule.yaml.

In the following commands we will cURL for a compressed list of IPs that appear in more than five sources, and generate a Falco list macro from it. You can use any feed. Here’s a free example we are using called FeodoTracker, which is maintained by the security analysts from Abuse.ch.

We will save the generated Falco list into a file named malicous_ips_list.yaml.

curl --compressed https://feodotracker.abuse.ch/downloads/ipblocklist.txt 
2>/dev/null | grep -v "#" | grep -v -E "s[1-5]$" |  cut -f 1  | sed "s/.*/'"&"',/g" 
| tr 'n' ' ' | sed "s/, $//" | sed 's/.*/- list: malicous_ip_list'

Once we have written the malicious IP list to ‘<strong>malicous_ips_list.yaml</strong>,’ we can create a Falco rule that detects connections to/from the list of IP addresses provided.Code language: Perl (perl)

– rule: Malicious IPs
 desc: Detect connections to/from a malicious IP
 condition: (inbound_outbound) and fd.sip in (malicous_ip_list) or fd.cip in (malicious_ip_list)   
 output: >
   Suspicious connection to/from a malicious IP detected (command=%proc.cmdline connection=%fd.name user=%user.name container_id=%container.id)
 priority: WARNING
 tags: [network]
Code language: Perl (perl)

If you are interested in this topic, you can read more about detecting TOR network connections with Falco.

Network policies

We discussed using NSGs at the infrastructure level. While critical, it’s important that you apply the same concept of zero-trust to Azure services like AKS.

Within Kubernetes, you can default-deny policies to drop all traffic that isn’t explicitly allowed in the previous policies. That way, whether or not the traffic is explicitly flagged and dropped by the threat feeds, we should drop all excess traffic. This is a great practice that should also be applied for NSGs.

Here’s the example for NetworkPolicies:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
Metadata:
  name: default-deny-all
Spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress
Code language: Perl (perl)

Backup of sensitive user data

Ransomware attacks are deliberately designed with the intention of either encrypting or destroying sensitive data and/or systems. If the data was not valuable to the end user, there would be no incentive for businesses to pay the ransom release fee.

Aside from exfiltrating sensitive data to third-party C2 servers, adversaries will target backups of your data to ensure the sensitive data is retrieved via the ransom payment. If backups are encrypted or destroyed, there is no copy of the data for the organization.

Azure Backup provides security to your backup environment when your data is in transit and at rest.
In this case, the backup is stored within Azure storage so that the attacker has no direct access to the sensitive backup storage. Similar to VMs, Azure creates backup snapshots of the storage in a service called Azure fabric, where the guest or attacker has no involvement other than quiescing the workload for application consistent backups.

az backup protection backup-now \
    --resource-group testRG \
    --vault-name testRecoveryServicesVault \
    --container-name nigelsVM \
    --item-name nigelsVM \
    --backup-management-type AzureIaaSVM
    --retain-until 11-12-2022
Code language: Perl (perl)

Example: backing-up the Virtual Machine ‘nigelsVM’ and setting expiration of recovery point to Dec. 11, 2022

Backup management should be considered as a last line of protection in the case where all of the steps above failed. Assuming an organization is hit by a ransomware attack, identifying the most critical systems to secure and applying best practices throughout the kill chain before the initial attack will help businesses in the incidence response and quickly rollback to a snapshot before the incident occured.

Backup management does not prevent, but mitigates the impact of a ransomware Azure attack on your organization. In fact, CISA recommends that individuals and businesses use what is known as a 3-2-1 strategy to backing-up sensitive data.

Here’s what the 3-2-1 backup rule involves:

3 – Keep three copies of any important file: one primary and two backups.
2 – Keep the files on two different media types to protect against different types of hazards.
1 – Store one copy offsite (i.g., shouldn’t reside in the potentially infected cloud environment).

Conclusion

Organizations can prevent (in most cases) initial access, as well as the lateral movement, by applying the principle of least privilege. The principle of least privilege is based on the assumption that every person, every device, every application, etc., is a potential threat to the organization, and therefore should only be granted the access permissions they need to complete a particular job function.

Least privilege can be applied to the images we scan and allow into our cluster, the network traffic we permit/deny via NetworkPolicies, as well as the connections we establish/prevent through NSGs. It might sound simple to enforce the above principle of least privilege, but you really need to ensure that your teams are actively managing the security posture of your Azure environment by assigning clear ownership of responsibilities for:

Monitoring security posture
Mitigating risks to assets

By automating and simplifying these tasks, you increase the ability to prevent ransomware on Azure attacks at various stages of the Azure Threat Research Matrix. Doing these tasks manually can increase the risk of end-user error, failure to adapt to organizational changes, and ultimately doesn’t transfer well if a new admin has to take over your role in the future.

Sysdig Secure provides end-to-end security for Microsoft Azure environments, as well as the containerized workloads that run inside AKS. Since Sysdig Secure also monitors the security posture of multi-cloud and hybrid-cloud environments, admins can easily hand over responsibility to a new admin in the future, and the same rules and monitoring are inherited without complex reconfiguration and training.

We only discussed Ransomware for Azure in this blog, but instructions will vary for AWS and GCP, so being able to enforce standardized security controls and monitoring for each of these environments via a single solution will certainly reduce the headache of security operations.