Architecting Cloud Instrumentation

Architecting cloud instrumentation to secure a complex and diverse enterprise infrastructure is no small feat. Picture this: you have hundreds of virtual machines, some with specialized purposes and tailor-made configurations, thousands of containers with different images, a plethora of exposed endpoints, s3 buckets with both public and private access policies, backend databases that need to be accessed through secure internet gateways, etc. Once your head stops spinning, it’s time to make a plan and start looking at security solutions that can help you sort this mess out.

And that’s where a whole new problem starts: What kind of tools do you need? How many tools are you going to use? How do you evaluate them effectively? What can you expect from security instrumentation in the cloud?

Security solution architectures

When evaluating security solutions to architect your security, it is useful to understand how its instrumentation works. By getting a grasp of the instrumentation behind a tool, you can better assess what the strengths and the shortcomings are of each solution.

There are two main techniques used to instrument cloud resources: agentless and agent-based. State-of-the-art, cloud-native security solutions use a hybrid deployment approach, with both agentless and agent-based instrumentation covering different use cases or used to enhance the respective detections.

Agentless solutions offer a turn-key solution for basic posture and vulnerability management. They leverage the remote storage used by cloud resources, the cloud APIs, and the services audit logs to scan for security events without interfering with the original workload.

Agent-based solutions instead deploy a software probe alongside the workloads to inspect and monitor them. There are various techniques for this inspection, with the state of the art currently pointing at eBPF (extended Berkeley Packet Filter) based probes as the preferred solutions.

Getting started with agentless is quick and easy, but its use cases are limited compared to agent-based solutions. If you don’t need real time detection of what’s happening in your infrastructure, an agentless solution might be enough to cover your use cases. But if you require threat detection and response, near real-time posture drift, or accurate vulnerability management based on what’s running, you should use agent-based solutions.

You might also be tempted to use multiple tools to try to cover all your bases, but that might not always be possible, either because of budgetary constraints, or simply because of incompatibility between the way different tools work. Trying to consolidate the data from different tools often results in a huge wasted effort for subpar results.

The hard design compromises

Now that you broadly know how your candidate solutions work, what are some more distinctive features that might inform your choice?

The main challenge security solutions face is striking the right balance between visibility, unobtrusiveness, and performance. Maximizing one of these factors inevitably has negative repercussions on the others. For example, maximizing the visibility on the workloads will lead to a significant performance penalty at some point, and might also interfere with the application execution; on the other hand, focusing on performance or unobtrusiveness requires a conscious decision on the instrumentation visibility limits.

When evaluating these tools, you’ll need to understand where the compromises have been made, and whether the weaknesses align with your intended use case. Once again, understanding the design choices behind the tools will guide you through the selection process.

Cloud complexities

The classic design challenges get even harder to balance in a complex environment such as a modern cloud infrastructure.

Traditional designs often prove ineffective when dealing with the scale and the diversity of cloud resources. Security instrumentation for cloud environments needs to be purposefully designed keeping in mind multiple factors, such as scale, flexibility, and adaptability.

For example, an agentless tool collecting and centralizing its data in a single component would not make the cut in a medium or large scale cloud environment, where the individual resource count can easily reach hundreds of thousands units. It will either need a huge amount of resources to process the incoming data, or it will suffer from the latency incurred in ingesting large amounts of data in batches.

The heterogeneous types of resources (vms, containers, s3 buckets, IAM solutions, database services, third-party SaaS, etc.) put the flexibility of security tools to the test, and the introduction of new abstractions in the cloud demands a high degree of adaptability.

It’s important to evaluate and test security solutions in environments comparable to the production ones they will have to protect. What sounds great on paper often turns out to be lacking when tested in real-world scenarios.

Tough choices

There are a lot more factors than mere technical prowess in selecting the right security solution, including budget, quality of support, enterprise readiness, etc. But a good instrumentation design is essential for something that needs to measure and hold strong against ever more skilled malicious actors.

If you want to read more about what design choices are behind the architecture of a state-of-the-art cloud security solution, take a look at our whitepaper “In Cloud Security, Architecture Matters”.