Cloud404 :: Cloud Computing, Risk Management, Security, Compliance

Cloud Computing, Risk Management, Security, Compliance

Archive for the ‘Forensics and Investigation’ Category

Cloud Investigation – Part Deux

without comments

As “the cloud” becomes an architectural operation for businesses to leverage, many questions arise that — given what security pros have been through over the last 10 years — seem basic:

  1. How is data protected?
  2. How is data accessed by my applications or those of a business partner?
  3. If there is an incident, how can I investigate and what evidence can I collect?

Questions #1 and #2 will likely get the most initial attention as “the cloud” matures and is embraced.  However, once the move to the cloud is made and an incident occurs, question #3 will jump in priority.

I’d like to share some thoughts related to the differences of investigations within “the cloud” versus those that have a more traditional tone (i.e. a server that is on premise and under full control, or a single laptop computer where you can quickly obtain a hard drive and memory image).

First, there’s an important distinction to be made.  In some cases — especially in consumer-focused services — very little investigation can be performed (that is, unless you have subpoena power).  Providers simply don’t offer such interfaces because consumers (a) generally don’t perform investigative activities and (b) many privacy issues arise.    For example, it’s difficult to determine who has actually browsed photos that are being stored online via a photo management service.  This makes sense, because most consumers aren’t very paranoid about who is browsing their photos and the security controls that the providers offer tend to be straightforward.  However, with a service such as e-mail, some consumers would like to know if an outsider is gaining access to their information.  For example, within Google’s Gmail, one can see a list of the last few IP addresses (and the client type) that has accessed a mailbox.     Basically, you are given what the application provider wants you to have.  It’s difficult — if not impossible — to peel back the onion and access the data that is often needed to foster technically accurate conclusions. Also, the services are usually low-cost (or free), so the phrase “you get the support that you pay for” usually rings true.

Next, if we look at the enterprise scenario, access to low-level data within the security investigation process is quite important.  The enterprise wants to peel back the onion and obtain low level information for how the application is behaving, even if it is running “the cloud.”  When a security incident occurs, enterprise security teams want to be empowered to perform their own investigation without dependency on the provider.  From a provider perspective, “self-service” is an important element to achieve product scale.  So, we have to figure out how to do investigate, and where we can (a) determine what information we can get, and (b) where/how we can obtain it.

Let’s rewind a bit — when a business decides to adopt cloud computing, it’s likely in one of the following deployment options:

  • Software as a service (SaaS): Microsoft Online BPOS, Google Gmail, etc.  High in the stack. You consume the software, and can’t programmatically alter how it behaves (however, it’s likely there are a few knobs to change configuration).
  • Platform as a service (PaaS): Microsoft Azure, Google Apps, etc. You consume a platform, and upload applications that run within the provider’s “hosted sandbox.”   In this model, there’s little access to the underlying OS, but you can upload code that runs at the provider’s site.
  • Infrastructure as a service (IaaS): Amazon AWS, GoGrid, etc.  Full access to virtual machines running on the provider’s site.

In each of these scenarios, data has different states:

  • Data is at rest, written on the disk within an application-specific or OS-specific file format.  This state may contain de-allocated data (i.e. deleted files) that may not be used by the application or operating system, but is still accessible since it has not been reallocated and overwritten.
  • Data is in motion, being transmitted from a source to a destination over a network via numerous protocols, all encapsulated within each other, and each with different types of security (or, frequently within old protocols and applications, none at all)
  • Data is in execution, loaded into memory as a process, which contains series of executable steps that the processor is going to execute (threads).  A process may need to reference data (such as a file), so it loads it into memory.  Therefore, if you look at a snapshot of the memory of a server at any given time, you’d find process information, machine instructions, and allocated/de-allocated data.   In this state, data may be de-allocated (i.e. memory that has been de-allocated by a process and not yet reallocated/overwritten), but is still accessible.

Within each deployment option, the accessibility of data within each state differs. In addition, so do the primary and collateral investigative sources of data.

I’ve tried to build out a few matrices to further understand the intersections, and how to focus investigative & evidence collection activities.

Certainly a work in progress, but you may find them helpful..

Infrastructure-as-a-Service (click to enlarge)

Platform-as-a-Service (click to enlarge)

Software-as-a-Service (click to enlarge)

Written by Craig

January 22, 2010 at 11:30 pm

Security Investigation & Forensics in the Cloud

with one comment

As security professionals, we are tasked with securing data and investigating situations when a security breach occurs. However, as the cloud becomes a mainstream architectural option that businesses will undoubtedly embrace, there’s a bit of a problem for security practitioners: part of the business proposition of “the cloud” is all about losing control and abstracting the complexities and implementation details associated with infrastructure and applications.

This loss of control will present challenges for investigators, and will require a retooling of the traditional approaches that have been accepted by the computer forensic community. Investigators need control to obtain the appropriate knowledge of technical resources, process, and ability to interview associated individuals to digest a situation. In many forensic cases, we need to reconstruct the environment to recreate scenarios and test hypothesizes. Otherwise, our conclusions may not be trusted.

Before we dive into how “the Cloud” will monkey wrench the modern-day computer investigator mindset, we need to reflect on a few of the basics of investigation and forensics:

  • Data has to be collected in a manner that maximizes its integrity.
  • Preserving chain of custody for the “best evidence” is critical to admissibility in a court of law.
  • Conclusions that are derived from evidence should be reproducible by peers through well-accepted methods, within a controlled, similiar environment.

If we take these tenets to a cloud context, many questions immediately come to mind…

  1. In the heavily virtualized/abstracted world of cloud computing, how can I identify and obtain the data that I need?
  2. In the distributed cloud model, what collateral data can I identify and collect to help me prove/disprove a hypothesis?
  3. What data does my provider log?  How long do they keep it?
  4. What data will my provider give to me?
  5. What knobs can I turn up to get the data that I need?
  6. Does my provider expect me to do this in a self-serve fashion so they are not involved in the interpretation of data?  Do they expect me to use an API that I can use to gather it?
  7. If I need to ask my provider for data, how long will it take them to produce it?
  8. How/will they vouch for the integrity of the data?
  9. How/will they transfer it to me in a way that preserves integrity?
  10. What/where exactly is the “best evidence”?
  11. What methods and procedures are accepted?

The answer to these questions are complex, and are not based upon “the cloud” itself.  Why? Because “the cloud” itself isn’t uniform.  Each provider will have their approach to their cloud offerings, and each will enable a different form and depth of investigation.

THE CLOUD -vs- US

Let’s set the context:  thus far, mainstream cloud offerings are based on the Infrastructure as a Service (IaaS), Platform as a service (PaaS), and Software as a Service (SaaS) model. Let’s take a look at each, and the respective good/bad news in the context of investigation…

Software as a Service (SaaS): What is it? The SaaS model is the easiest to digest. A SaaS provider invokes an instance of an application for your organization.  There are knobs that can be turned up and down, and basic configuration can be applied.  The consumer may be able to interface with the application via an API, but you won’t have deep programmatic control that will modify the core application.   Examples of this model: Google Gmail, Microsoft Online BPOS, the popular Less packages [http://lesseverything.com/solutions], and the Zoho suite [http://www.zoho.com].

  • SaaS Good news: You might be able to get high level application logs.  This data might log success and failures, and might reflect the actual activities within the environment.  It will depend on what the provider decides to log, and how long they store it.  In some cases, some services – such as federated authentication – may be handled by a separate service.  This separate service (which may be operated by a completely separate provider) might have “collateral” data that you may find of interest, but you’ll have to figure out how to get it. The best news of this scenario is that you’ll be able to recreate the environment with a high level of accuracy* (of course, many disclaimers apply here). For example, in the SaaS email deployment, many providers of messaging solutions have an option for “message journaling.” This feature tells the provider to transparently forward a “carbon-copy” of all messages to archive service. This service can be used to gather evidence that sits in the organization’s email activity.  However, to use it, you need to ensure that (a) it’s been purchased, (b) it is configured right (i.e. retention policy), and, in the case that you choose a different provider, (c) your email service is configured to send data to it.

  • Bad news: Low level disk imaging? Very unlikely. Installing forensic tools to obtain system state information? Even more unlikely.  Interpreting information? Difficult. You’ll need to know a lot about the application, and all of the scenarios that it can be used.  For example, many SaaS accounting applications have (a) a web interface, (b) a client application, and (c) an web service API (that’s usually invoked with an API key (aka “shared secret authentication”).  Will you be able to interpret log entries properly across these code paths? And, by the way, you may not have a version of the server-side software that you can deploy in your lab for low-level analysis.

Platform as a Service (PaaS): In this model, the consumer deploys application packages to a runtime environment that is hosted by a cloud provider.  For example, if you use Microsoft Windows Azure, you build applications in Visual Studio, compile, and publish a package through the Azure developer portal .  This package (and a respective configuration file) is uploaded to Azure, where it is executed within a uniform Windows/.NET runtime environment.  Similarly, with Google applications, you can build Python or Java applications which can be uploaded to Google’s Application engine for execution. In this model, you own the core application, and programatically dictate how it will interacts with other dependencies (such as calling Database resources).

  • PaaS Good news: Your dev organization controls the core application. Thus, you can log information as you desire (to both a location, such as blob storage, an external database (even at another Cloud provider!), or even SYSLOG to your SIEM solution). Since you have programmatic control of the platform, you can likely invoke custom code that interrogates system state and pulls logs. You just need to invest the time to configure it, and convince your development team that this feature is worth their time.
  • PaaS Bad news: You may or may not be able to get logging information from the underlying runtime environment. This will depend on how the provider has it configured, and what they’ll allow you to query. In this model, there are two important elements of the platform that need to be considered.  In the Microsoft stack, it’s (a) the virtualized OS, (b) IIS, and (c) the .NET runtime environment.  The Google model is similar. There’s an OS, a web server, and a runtime environment (either Java or Python).

Infrastructure as a Service (IaaS): What is it? The consumer deploys virtual machines, in which they have administrative access. In some cases (GoGrid), the virtual machines use persistent storage (if a VM is rebooted, bits written to the disk will remain).  Others, such as Amazon ECS, do not have persistent storage – when a VM is rebooted, it is “reset” back to the base VM image.  In Amazon AWS, persistent storage is derived from writing bits to Amazon EBS (Elastic Block Storage), or other data repository (such as an Amazon SimpleDB).

  • IaaS Good news: Although I haven’t tried it, I suspect that enterprise forensic software, such as Encase Enterprise, can be installed.  You also have the ability to connect to the underlying VM (i.e. Linux console or Windows Terminal Services) to perform deep interrogation of the machine. Therefore, many of the “traditional” processes associated with forensics can be observed (such as querying system state).  I’d also suspect that you can obtain low level disk access via iSCSI interface.
  • More Good news: Many of the IaaS providers support snapshoting a running VM. So, you can capture the state of a running host quickly (and, for the ultra paranoid, perhaps via API that snapshots a host after system monitoring detects an abnormality).
  • IaaS Bad news: To do this level of investigation, you’ll need some robust connectivity to the Internet (or provider’s network). It’s plausible that you can have a similarly hosted “security cloud” that has loose coupling (and robust connectivity) to your “production cloud” where you can pull disk images, system state, etc, etc.

BLOCK STORAGE

There is a storage concept “in the cloud” that span a few of these offerings.  To enable “infinite” storage on a pay-per-use business model, many cloud providers are building “block storage.” Why does this matter? It’s because block storage isn’t the traditional file system that forensic practitioners hold close to their hearts and toolsets. In most cases, we grab a disk image and carve through the image looking for data.  The evolution of “grabbing a disk image” is interesting — when most IT was on site, we’d just remove the disk, hook it up to an imaging environment, and grab a bitstream image.  When IT became distributed, quite a few tools and techniques allowed us to remotely image machines.  With a mere copy of dd and netcat, our world was only limited by bandwidth.  However — in the cloud — the next step is unclear.

Cloud-based block storage contains a level of data abstraction that is great for the business folks (who write checks based on what is actually used), developers (cloud based storage is heavily abstracted through flexible APIs), and architects (who no longer need to worry about site-level, and in some cases, global fault tolerance).

However, for our forensic mindset, all access to block storage is via a platform-specific API, and there is no ability to “image” cloud-based block storage to “carve” through unallocated data, looking for remnants of activities.  Access to this storage is purely logical — and completely focused on allocated space.   [Note: This is my current understanding on the APIs available at the time of this writing.  If anyone knows otherwise, please contact me..]

NEXT STEPS…

In conclusion, I’m not ready to throw away my traditional investigation tools quite yet. But, I do believe that the cloud is driving many macro factors that will require a massive recalibration of our tools, techniques, and understanding.  The rate at which systems can be rebuilt (since one only needs a credit card to provision new service) will be fast – and the tidal wave of new technologies to master will be daunting.

However, the super optimistic, revolution-through-evolution perspective is also clear — SaaS and PaaS cloud offerings will poise some opportunity for us to push better security logic and logging higher in the stack — into our applications.  It may also ignite the opportunity for SIEM providers to have cloud offerings, and publish some standardized APIs to allow us to easily and programmatically push log data into their platform – and perhaps more importantly – into a more standardized taxonomy (if you’ve configured a SIEM solution for custom applications, you know what I mean!).

Craig

Written by Craig

November 16, 2009 at 2:28 am