Cloud 101CircleEventsBlog
Master CSA’s Security, Trust, Assurance, and Risk program—download the STAR Prep Kit for essential tools to enhance your assurance!

Continuous Monitoring in the Cloud

Published 06/11/2018

Continuous Monitoring in the Cloud

By Michael Pitcher, Vice President, Technical Cyber Services, Coalfire Federal

lock and key for cloud security

I recently spoke at the Cloud Security Alliance’s Federal Summit on the topic “Continuous Monitoring / Continuous Diagnostics and Mitigation (CDM) Concepts in the Cloud.” As government has moved and will continue to move to the cloud, it is becoming increasingly important to ensure continuous monitoring goals are met in this environment. Specifically, cloud assets can be highly dynamic, lacking persistence, and thus traditional methods for continuous monitoring that work for on-premise solutions don’t always translate to the cloud.

Coalfire has been involved with implementing CDM for various agencies and is the largest Third Party Assessment Organization (3PAO), having done more FedRAMP authorizations than anyone, uniquely positioning us to help customers think through this challenge. However, these concepts and challenges are not unique to the government agencies that are a part of the CDM program; they also translate to other government and DoD communities as well as commercial entities.

To review, Phase 1 of the Department of Homeland Security (DHS) CDM program focused largely on static assets and for the most part excluded the cloud. It was centered around building and knowing an inventory, which could then be enrolled in ongoing scanning, as frequently as every 72 hours. The objective is to determine if assets are authorized to be on the network, are being managed, and if they have software installed that is vulnerable and/or misconfigured. As the cloud becomes a part of the next round of CDM, it is important to understand how the approach to these objectives needs to adapt.

Cloud services enable resources to be allocated, consumed, and de-allocated on the fly to meet peak demands. Just about any system is going to have times where more resources are required than others, and the cloud allows compute, storage, and network resources to scale with this demand. As an example, within Coalfire we have a Security Parsing Tool (Sec-P) that spins up compute resources to process vulnerability assessment files that are dropped into a cloud storage bucket. The compute resources only exist for a few seconds while the file gets processed, and then they are torn down. Examples such as this, as well as serverless architectures, challenge traditional continuous monitoring approaches.

However, potential solutions are out there, including:

  • Adopting built-in services and third-party tools
  • Deploying agents
  • Leveraging Infrastructure as Code (IaC) review
  • Using sampling for validation
  • Developing a custom approach

Adopting built-in services and third-party tools

Dynamic cloud environments highlight the inadequacies of performing active and passive scanning to build inventories. Assets may simply come and go before they can be assessed by a traditional scan tool. Each of the major cloud services providers (CSPs) and many of the smaller ones provide inventory management services in addition to services that can monitor resource changes – examples include AWS’ System Manager Inventory Manager and Cloud Watch, Microsoft’s Azure Resource Manager and Activity Log, and Google’s Asset Inventory and Cloud Audit Logging. There are also quality third-party applications that can be used, some of them even already FedRAMP authorized. Regardless of the service/tool used, the key here is interfacing them with the integration layer of an existing CDM or continuous monitoring solution. This can occur via API calls to and from the solution, which are made possible by the current CDM program requirements.

Deploying agents

For resources that are going to have some degree of persistence, agents are a great way to perform continuous monitoring. Agents can check in with a master to maintain the inventory and also perform security checks once the resource is spun up, instead of having to wait for a sweeping scan. Agents can be installed as a part of the build process or even be made part of a deployment image. Interfacing with the master node that controls the agents and comparing that to the inventory is a great way to perform cloud-based “rogue” asset detection, a requirement under CDM. This concept employed on-premises is really about finding unauthorized assets, such as a personal laptop plugged into an open network port. In the cloud it is all about finding assets that have drifted from the approved configuration and are out of compliance with the security requirements.

For resources such as our Coalfire Sec-P tool from the previous example, where it exists as code more than 90 percent of the time, we need to think differently. An agent approach may not work as the compute resources may not exist long enough to even check in with the master, let alone perform any security checks.

Infrastructure as code review

IaC is used to deploy and configure cloud resources such as compute, storage, and networking. It is basically a set of templates that “programs” the infrastructure. It is not a new concept for the cloud, but the speed at which environments change in the cloud is bringing IaC into the security spotlight.

Now, we need to consider how we can perform assessment on the code that builds and configures the resources. There are many tools and different approaches on how to do this; application security is not anything new, it just must be re-examined when we consider it part of performing continuous monitoring on infrastructure. The good news is that IaC uses structured formats and common languages such as XML, JSON, and YAML. As a result, it is possible to use tools or even write custom scripts to perform the review. This structured format also allows for automated and ongoing monitoring of the configurations, even when the resources only exist as code and are not “living.” It is also important to consider what software is spinning up with the resources, as the packages that are leveraged must include up-to-date versions that do not have vulnerabilities. Code should undergo a security review when it changes, and thus the approved code can be continuously monitored.

Setting asset expiry is one way to enforce CDM principals in a high DevOps environment that leverages IaC. The goal of CDM is to assess assets every 72 hours, and thus we can set them to expire (get torn down, and therefore require rebuild) within the timeframe to know they are living on fresh infrastructure built with approved code.

Sampling

Sampling is to be used in conjunction with the methods above. In a dynamic environment where the total number of assets is always changing, there should be a solid core of the fleet that can be scanned via traditional means of active scanning. We just need to accept that we are not going to be able to scan the complete inventory. There should also be far fewer profiles, or “gold images,” than there are total assets. The idea is that if you can get at least 25% of each profile in any given scan, there is a good chance you are going to find all the misconfigurations and vulnerabilities that exist on all the resources of the same profile, and/or identify if assets are drifting from the fleet. This is enough to identify systemic issues such as bad deployment code or resources being spun up with out-of-date software. If you are finding resources in a profile that have a large discrepancy with the others in that same profile, then that is a sign of DevOps or configuration management issues that need to be addressed. We are not giving up on the concept of having a complete inventory, just accepting the fact that there really is no such thing.

Building IaC assets specifically for the purposes of performing security testing is a great option to leverage as well. These assets can have persistence and be “enrolled” into a continuous monitoring solution to report on the vulnerabilities in a similar manner to on-premises devices, via a dashboard or otherwise. The total number of vulnerabilities in the fleet is the quantity found on these sample assets, multiplied by the number of those assets that are living in the fleet. As we stated above, we can get this quantity from the CSP services or third-party tools.

Custom approaches

There are many different CSPs out there for the endless cloud-based possibilities, and all CSPs have various services and tools available from them, and for them. What I have reviewed are high-level concepts, but each customer will need to dial in the specifics based on their use cases and objectives.

Share this content on your favorite social network today!