AWS Redshift Security

SHARE:

Facebook logo LinkedIn logo X (formerly Twitter) logo

AWS Redshift is a fast and fully managed petabyte-scale data warehouse service that allows users to store and analyze their data using business intelligence tools such as Amazon Quicksight, Microsoft’s Power BI, or Tableau among others, in a higher cost-effective way.

In this article, we will look at various security features available in AWS Redshift and how they can be used to secure data and prevent unauthorized access, stay tuned and keep scrolling.

Importance of Redshift Security

Using Redshift gives organizations the ability to protect sensitive data. As much as security is essential, Redshift customers have something to marvel about. That is, Amazon Redshift offers all security features at no additional cost. The features satisfy the most demanding security, privacy, and compliance requirements.

Redshift security supports:

  • Multi-factor authentication.
  • Granular access on Amazon VPC.
  • Built-in identity management and federation for single sign-on.

Other features supported by Redshift include:

  • A Redshift data warehouse cluster can be segregated into a private virtual network using Amazon VPC, ensuring the data is protected against external attacks.
  • By restricting network access to one’s data warehouse, it is fully appreciated why firewall configuration is essential during setup.
  • The use of Amazon’s IAM to request authentication.
  • Amazon recently introduced Role-Based Access Control (RBAC). This feature simplifies security permissions and also controls end-user access to data.
  • Amazon’s CloudTrail monitors and records account activity across the AWS infrastructure, including Redshift. Therefore, it logs all operations making logs easily available for audit and analysis, including:
    • Connection attempts,
    • Queries on the data warehouse, and
    • Changes made on the warehouse.
  • Redshift makes use of dynamic data masking to selectively mask personal information data during querying.

Security Measures in Redshift

As mentioned earlier, all Amazon Redshift security features are offered at no additional cost. Such features include:

  • Encryption
  • User authentication
  • Network security

Encryption of Data at Rest

Data at rest is secured using encryption by Amazon’s Redshift. Redshift can optionally encrypt user data as it is written to the data centers and decrypt it as the user reads it with the proper request authentication and access permission. This encryption of the data at rest all happens on the server side.

Client-side encryption is yet another feature that Redshift provides. Users are able to encrypt data before loading it into the system and decode it after retrieving it. SSE-S3 encryption for data loaded from Amazon S3 or customer-managed encryption keys (CMEK) controlled by AWS Key Management Service (KMS) could be used for such.

Encryption of Data in Transit

Data in transit would be defined by commands such as COPY and UNLOAD, or operations such as backup and restore, for Amazon’s Dynamo Database or Amazon Simple Storage Service (Amazon S3). For such data that is in transit within AWS Cloud, Amazon Redshift uses hardware-accelerated SSL (Secure Sockets Layer) to relay communication. SSL keeps internet connections secure and safeguards any sensitive data by establishing an encrypted link between nodes.

Access Control & User Authentication

As we mentioned earlier, signing up for AWS is a prerequisite for using any of its products, including Redshift. The AWS account that created the resource is referred to as the resource owner – that is, the root account, an IAM user, that also authenticates the request to create resources. The resources created are, however, owned by the AWS account of the main entity.

Such administrative permissions allow a user not only to authenticate requests but also create clusters among other administrative tasks, such as delegating permissions. The delegation can allow a user to create, delete, modify, and reboot Amazon Redshift clusters for another AWS account.

Below is an example:

Redshift configuration example

On the other hand, connections to Amazon Redshift data storage must be authenticated using user credentials in order to safeguard data against unwanted access. Some of the main requirements during authenticating connections are:

  • Host
  • Port
  • Database
  • AccessKeyID
  • SecretAccessKey

Network Security and Firewall Configurations

At the most basic level, network security would involve setting up a network firewall. A firewall monitors, inspects, and filters incoming and outgoing traffic in a network. As a cloud service provider, Amazon is at the forefront to ensure one can create firewall rules that provide fine-grained control over network traffic.

AWS Network Firewall works together with AWS Firewall Manager so that one can build policies and use them across their AWS accounts and VPC. AWS Firewall allows the user to filter traffic at the perimeter of their VPC.

The AWS Firewall Manager manages multiple AWS Network Firewall deployments. One important relationship to note between VPC and the AWS Network Firewall is that all requests entering and leaving the VPC through gateways are routed through AWS Network Firewall. Here are a few things to remember when creating a firewall:

  • Select the VPC you want to protect, then link it to a firewall policy that specifies the

    protection that will be used.

  • Select a subnet in each availability zone to house the firewall endpoints and each zone that needs to be covered or protected.

  • To specify the behavior of the firewall policy, add custom rule groups.

  • Keep in mind that the Network Firewall rule groups have rules for monitoring traffic entering and leaving the VPC.
  • Update route tables to route traffic through the firewall endpoints for both incoming and outgoing traffic.

For more insight on how to create an AWS Network Firewall, check this out.

Redshift Security Best Practices

General best practices

Any service provider will always advise consumers to adhere to several professional processes in order to maximize efficiency and make optimal use of any service. When it comes to Amazon Redshift, as a data management resource, things aren’t any different. Here are some best practices that can be considered to fully maximize its benefits:

  • Configuring Access: As a prerequisite, one has to sign up for an AWS account. This creates a root user who has access to all AWS services and resources. Amazon recommends assigning administrative access to an administrative user and only makes use of the root user to perform tasks that require root user access.
  • Firewall Rules & Configurations: Amazon Redshift uses port 5439 by default. For any computer that is behind a firewall, Amazon recommends that one knows an open port that can be used to connect to a Redshift cluster from a SQL client tool and run queries.

    Note: One cannot change the port number of the Amazon Redshift cluster once it is open.

  • Running Queries on Databases: Amazon Redshift is a Relational Database Management System. It is also compatible with RDBMS applications for other databases. Therefore, it provides functions including online transaction processing (OLTP), that is, insertion and deletion of data. Other than data operations and structuring, such as sorting, column compressions, distribution styles (key-based), and observing data types, it is important to ensure fast-running queries.

    Redshift includes performance logging and monitoring measures built in to help detect and improve the performance of slow-running queries. Redshift lets users employ vacuums and analyze commands for the purpose of reclaiming space.

  • Data Processing Flow: When you perform an analytical query, you retrieve, compare, and evaluate a significant amount of data in numerous stages to arrive at a decision.

    Some guidelines to observe include the following:

    • Using federated query data from relational databases, such as Amazon RDS, Aurora, and S3, which can be combined with data in Redshift.
    • Redshift may, therefore, be used to interactively query the data, apply transformations, and insert without transferring it.
    • With the use of data shares, it is relatively simple and secure to exchange data at many levels, including databases, schemas, tables, and user-defined SQL functions.
    • Use Amazon’s Redshift Spectrum for querying. This allows you to query data stored in Amazon S3 without first loading them into Redshift tables.

Configuring Security Groups and VPCs

A VPC protects access to one’s cluster by using a virtual networking environment. Amazon Redshift supports two platforms where a user can launch a cluster in a VPC, namely:

  • EC2-VPC
  • EC2-Classic

When creating a cluster on Redshift, we can instantiate it by using an EC2-VPC. To create a cluster in VPC:

  • First, set up a VPC or use the default one. However, it is important to note:
    • The VPC identifier
    • Subnets
    • Subnets’ availability zones
  • Create an Amazon Redshift cluster subnet group using the console or by programming to designate which subnet your Amazon Redshift cluster can utilize in the VPC.
  • Authorize access to a VPC security group that you have linked to the cluster for inbound connections.

Security groups, on the other hand, are used to grant other users inbound access to a Redshift cluster. A security group is first defined and then associated with a specific cluster. The defined security group must allow access over the database port for the cluster so as to connect using SQL client tools. The rules for security groups can be defined during creation or later on. This is an advantage when doing the configurations.

Some of the security group best practices include:

  • Authorize only specific IAM principles to create and modify security groups.
  • Create the minimum number of security groups that you need to decrease the risk of error.
  • Add inbound rules for ports 22 (SSH) or 3389 (RDP) so that you can access your EC2 instances.
  • Do not open large port ranges.
  • Consider creating network ACLs with rules similar to your security groups to add an additional layer of security to your VPC.

Access Management – AWS IAM Accounts

As implied earlier, an IAM account (user) has the privilege to create other users or groups and specify their roles. With the defined roles and permissions comes access control to Redshift resources. To grant or deny permission to users or groups, IAM makes use of IAM policies. Within the “.json” file, we can define the policies as JSON documents that specify the actions that can be allowed or denied on specific resources.

Logging and Monitoring

Earlier, we defined Amazon Redshift as a data warehouse service. It allows for the storage and analysis of large sums of data. Therefore, always ensuring that your clusters are running effectively and allowing for troubleshooting, monitoring, and logging are inevitable requirements.

Logging

Logs from Redshift clusters can be exported to CloudWatch or as files to Amazon S3 buckets for auditing. In Redshift, logging is enabled in these ways:

To use the AWS Management Console for audit logging:

  • Sign in to the AWS Management Console.
  • Open Redshift Console.
  • Head to the Navigation menu and select the cluster that is to be updated.
  • Under the Properties tab on the Database Configurations panel, choose Edit Audit Logging.
    • At this point, one can choose to turn on S3 bucket or CloudWatch and export the logs.

AWS recommends using CloudWatch, as administering it is easy and it has features for data visualization.

Monitoring

The following are monitored in Redshift:

  • Performance metrics
  • Querying and load performance data

This enables a user to keep track of databases as well as cluster health and performance. For monitoring, we make use of Amazon CloudWatch. From a security perspective, we can now understand how and why to make use of these linked products.

CloudWatch metrics have features to monitor physical aspects of your cluster, such as latency, throughput, and CPU utilization. These metrics are directly displayed in Amazon Redshift Console. With the performance data, one can easily correlate what is happening at the specific database query and load events.

Conclusion

Data protection is made possible by a variety of security mechanisms. Amazon Redshift offers a number of these features without charging extra fees, including network isolation, encryption, data monitoring and logging, dynamic data masking, and Role-Based Access Control.

By utilizing these features, organizations ensure that their data is protected against unauthorized access, modification, or disclosure. Data can also be isolated on a private network on Amazon VPC, adding an extra layer of protection. In addition, organizations can opt to implement IAM policies to restrict end-user access to data, ensuring only authorized users get access to sensitive data.

Here are additional resources you can use to learn more about AWS Redshift security: