Live Response: Collecting Volatile Data (Windows Forensic Analysis) Part 1

Introduction

Investigators today are increasingly facing situations in which the traditional, widely accepted computer forensic methodology of unplugging the power to a computer and then acquiring a bit-stream image of the system hard drive via a write blocker is, simply, not a viable option. For instance, it is becoming more common for investigators to encounter servers that are critical to business operations and cannot be shut down. Investigators and incident responders are also seeing instances in which the questions they have (or are asked) cannot be answered using the contents of an imaged hard drive alone. For example, I’ve spoken with law enforcement officers regarding how best to handle situations involving missing children who were lured from their homes or schools via instant messages (IMs), particularly when faced with the fact that some IM applications do not write chat logs to disk, either at all or in their default configurations.

Questions such as these are not limited to law enforcement. In many cases, the best source of information or evidence is available in computer memory (network connections, contents of the IM client window, memory used by the IM client process, encryption keys and passwords, etc.). In other cases, investigators are asked whether a Trojan or some other form of malware was active on the system and whether sensitive information was copied off the system. Essentially, first responders and investigators are being asked questions regarding what activity was occurring on the system while it was live, and these questions cannot be answered when following the traditional, "purist" approach to digital forensics. Members of information technology (IT) staffs are finding anomalous or troubling traffic in their firewalls and intrusion detection system (IDS) logs, and are shutting off systems from which the traffic is originating before determining which process was responsible for the traffic. Situations such as these require that the investigator perform live response—collecting data from a system while it is still running. This in itself raises some issues, which we will address throughout this topic.


Perhaps more important is that the requirement to perform some kind of live response is no longer something organizations decide to do. Instead, in some ways live response is being mandated by legislation as well as regulatory bodies (the Visa Payment Card Industry, or PCI, comes to mind). When a compromise occurs on a system, these regulatory bodies ask three basic questions:

■ Was the system compromised?

■ Did the compromised system contain "sensitive" data? (See the appropriate legislation or regulatory guidelines for the definition of "sensitive" data.)

■ If the answer to both of the preceding questions is "yes", did the compromise of the system lead to the exposure of that sensitive data?

However, many organizations are simply unprepared for an incident, and as such, the activities of their responders can expose those organizations to greater risk than the incident itself, largely due to the fact that the "shut-the-system-off-and-wipe-it" mentality of many IT organizations does not allow for the collection of the necessary data to answer the inevitable questions. These questions invariably arise when the legal or compliance department of the organization hears about the incident, and then finds out that those questions cannot be answered.

Live Response

Investigators today face a number of issues where unplugging a system (or several systems) and acquiring an image of the hard drive(s) might not be an option. As the use of e-commerce continues to grow, system downtime is measured in hundreds or thousands of dollars per minute, based on lost transactions. Therefore, taking a system down to acquire a hard-drive image has a serious effect on the bottom line. Also, some companies have service-level agreements (SLAs) guaranteeing "five nines" of uptime—that is, the company guarantees to its customers that the systems will be up and operational 99.999 percent of the time (outside of maintenance windows, of course). Taking a system with a single hard drive offline to perform imaging can take several hours, depending on the configuration of the system.

The Information Superhighway is no longer just a place for joy riders and pranksters. A great deal of serious crime takes place in cyberspace, and criminal activities are becoming increasingly sophisticated. Software programs can get into your computer system and steal your personal information (passwords, personal files, income tax returns, and the like), yet the code for some of these programs is never written to the hard drive; the programs exist only in memory. When the system is shut down, all evidence of the program disappears.

In April 2006, Seagate introduced the first 750GB hard drives. Today, I regularly see external hard drives available in sizes greater than 1.5 terabytes (TB), and I see multiterabyte storage systems on customer networks. Imagine a RAID 5 system with eight 1TB hard drives, topping out at 8 TB of storage. How long would it take you to image those hard drives? With certain configurations, it can take investigators four or more hours to acquire and verify a single 80GB hard drive. And would you need to image the entire system if you were interested in only the activities of a single process and not in the thousands of files resident on the system?

In some cases, we might want to collect some information about the live system before shutting it down, acquiring a bit-stream image of the hard drive or drives, and performing a more traditional computer forensic investigation. The information you would be most interested in is volatile in nature, meaning that it ceases to exist when power is removed from the system. This volatile information usually exists in physical memory, or RAM, and consists of such things as information regarding processes, network connections, the contents of the Clipboard, and so on. This information describes the state of the system at the time you are standing in front of it or sitting at the console or accessing it remotely. As an investigator, you could be faced with a situation in which you must quickly capture and analyze (covered in the next topic) data to determine the nature and scope of the incident. When power is removed from the system in preparation for imaging the hard drive in the traditional manner, this information simply disappears. However, you also need to keep in mind that any actions you take (e.g., running antivirus scans, searching for files or credit card data, reconfiguring the system, etc.) on a live system are going to leave artifacts of their own, and possibly will overwrite useful or pertinent data. Therefore, collecting and preserving this volatile data should be your first concern.

We do have options available to us—tools and techniques we can use to collect this volatile information from a live system, giving us a better overall picture of the state of the system as well as providing us with a greater scope of information. This is what "live response" entails: accessing a live, running system and collecting volatile (and in some cases, nonvolatile) information.

There is another term you might hear that is often confused with live response: live acquisition. Live response deals with collecting volatile information from a system; live acquisition describes acquiring the hard drive while the system is still running and creating an image of that hard drive. In this topic, we’ll start by discussing tools, techniques, and methodologies for performing live response. When we talk about performing live response, we need to understand what information we want to collect from the system and how we should go about collecting it. In this topic, we will walk through the what and how of collecting volatile information from a system; in the next topic, we will discuss how to analyze this data. Following that, we will examine some solutions for performing a live acquisition.

Before we start discussing live-response tools and activities, we need to address two important topics: Locard’s Exchange Principle and the order of volatility. These concepts are the cornerstones of this topic and live response in general, and we will discuss them in detail.

Locard’s Exchange Principle

In performing live response, investigators and first responders need to keep a very important principle in mind. When we interact with a live system, whether as the user or as the investigator, changes will occur on that system. On a live system, changes will occur simply due to the passage of time, as processes work, as data is saved and deleted, as network connections time out or are created, and so on. Some changes happen when the system just sits there and runs. Changes also occur as the investigator runs programs on the system to collect information, volatile or otherwise. Running a program causes information to be loaded into physical memory, and in doing so, physical memory used by other, already running processes may be written to the page file. As the investigator collects information and sends it off the system, new network connections will be created. All of these changes can be collectively explained by Locard’s Exchange Principle. Changes that occur to a system as the system itself apparently sits idle are referred to as "evidence dynamics" and are similar to rain washing away potential evidence at a crime scene.

In the early 20th century, Dr. Edmond Locard’s work in the area of forensic science and crime scene reconstruction became known as Locard’s Exchange Principle. This principle states, in essence, that when two objects come into contact, material is exchanged or transferred between them. If you watch the popular CSI crime show on TV, you’ll invariably hear one of the crime scene investigators refer to possible transfer. This usually occurs after a scene in which a car hits something or when an investigator examines a body and locates material that seems out of place.

This same principle applies in the digital realm. For example, when two computers communicate via a network, information is exchanged between them. Information about one computer will appear in the process memory and/or log files on the other (see the "Locard and Netcat" sidebar for a really cool demonstration of this concept). When a peripheral such as a removable storage device (a thumb drive, an iPod, or the like) is attached to a Windows computer system, information about the device will remain resident on the computer. When an investigator interacts with a live system, changes will occur to that system as programs are executed and data is copied from the system. These changes might be transient (process memory, network connections) or permanent (log files, Registry entries).

Tools & Traps

Locard and Netcat

You can use simple tools, such as netcat (http://en.wikipedia.org/wiki/Netcat), to demonstrate Locard’s Exchange Principle. If you’re not familiar with netcat (nc.exe on Windows systems), suffice it to say that netcat is an extremely versatile tool that allows you to read and write information across network connections.

For this example, you will need three tools: netcat (nc.exe), pmdump.exe (www. ntsecurity.nu/toolbox/pmdump/), and strings.exe (http://technet.microsoft.com/en-us/ sysinternals/bb897439.aspx) or BinText (available from www.foundstone.com/us/ resources/proddesc/bintext.htm). You can run this example using either one or two systems, but it works best when two systems are used. If you’re using one system, create two directories, with a copy of netcat in each directory.

Start by launching netcat in listening mode with the following command line:

tmp13-1_thumb

This command line tells netcat to listen on port 8080, in detached mode, and when a connection is made to launch the command prompt. Once you’ve typed in the command line and pressed Enter, open the Task Manager and note the process identifier (PID) of the process you just created. (Here I am using netcat Version 1.11 NT, which I retrieved from www.vulnwatch.org/netcat. At the time of this writing, the Web site does not appear to be available.)

Now open another command prompt on the same system, or go to your other system and open the command prompt. Type the following command line to connect to the netcat listener you just created:

tmp13-2_thumb

This command line tells netcat to open in client mode and to connect to the Internet Protocol (IP) address on port 8080, where our listener is waiting. If you’re running the test on a single system, use 127.0.0.1 as the IP address.

Once you’ve connected, you should see the command prompt header that you normally see, showing the version of the operating system and the copyright information. Type a couple of commands at the prompt, such as dir or anything else, to simply send information across the connection.

On the system where the netcat listener is running, open another command prompt and use pmdump.exe (discussed later in this topic) to obtain the contents of memory for the listener process:

tmp13-3_thumb

This command will obtain the contents of memory used by the process and will put it into the file netcat1.log. You may also dump the process memory of the client side of the connection, if you like. Now that you have the process memory saved in a file, you can exit both processes. Run strings.exe against the memory file from the listener or open the file in BinText and you will see the IP address of the client. Doing the same thing with the client’s memory file will display information about the system where the listener was running, demonstrating the concept of Locard’s Exchange Principle.

Programs that we use to collect information might have other effects on a live system. For example, a program might need to read several Registry keys, and the paths to those keys will be read into memory. Windows XP systems perform application prefetching, so if the investigator runs a program that the user has already run on the system, the last access and modification times of the prefetch file (as well as the contents of the file itself) for that application will be modified. If the program that the investigator runs hasn’t been used before, a new prefetch file will be created in the Prefetch directory (assuming the contents of the Prefetch directory haven’t reached their 128 .pf file limit … but more on that later in the topic).

Investigators not only need to understand that these changes will occur but also must document those changes and be able to explain the effects their actions had on the system to a reasonable extent. For example, as an investigator you should be able to determine which .pf files in the XP Prefetch directory are a result of your efforts and which are the result of user activities. The same is true for Registry values. As with the application prefetching capabilities of Windows XP, your actions will have an effect on the system Registry. Specifically, entries may appear in the Registry, and as such the LastWrite times of the Registry keys will be updated. Some of these changes might not be a direct result of your tools or actions, but rather are made by the shell (i.e., Windows Explorer), due simply to the fact that the system is live and running.

By testing and understanding the tools you use, you will be able to document and explain what artifacts found on a system are the result of your efforts and which are the result of actions taken by a user or an attacker.

Tip ::

When considering whether to engage in live-response activities it is very important to keep in mind that although your actions do have an effect on the system (processes loaded into memory, files created on the system as a result of your actions, etc.), so does your inaction. Think about it. A live system is running, with things going on all the time. Even while a system just sits there, processes are running and actions are occurring on the system. With Windows XP, simply wait 24 hours and a System Restore Point will be created automatically (by default). Wait three days and the system will conduct a limited defragmentation. Also consider the fact that if someone is exfiltrating data from your systems, while you wait and do nothing that person will continue to take more data. So, the question of whether to engage in live response really comes down to (a) do I do nothing, or (b) do I take the correct actions to protect my organization as best I can under the circumstances?

Order of Volatility

We know that volatile information exists in memory on a live system and that certain types of volatile information can be, well, more volatile than others. That is, some information on a live system has a much shorter shelf life than other information. For instance, network connections time out, sometimes within several minutes, if they aren’t used. You can see this by browsing to a specific site or making some other network connection and viewing that connection via netstat.exe. Then shut down the client application you’re using and the state of the network connection will change over time before it eventually disappears from the output of netstat.exe. The system time, however, changes much more quickly, while the contents of the Clipboard will remain constant until either they are changed or power is removed from the system. Additionally, some processes, such as services (referred to as daemons in the UNIX realm) run for a long time, whereas other processes can be extremely short-lived, performing their tasks quickly before disappearing from memory. This would indicate that we need to collect certain information first so that we can capture it before it changes, whereas other volatile data that happens to be more persistent can be collected later.

A great place to go for this information is the Request for Comments (RFC) document 3227, "Guidelines for Evidence Collection and Archiving" (www.faqs.org/rfcs/rfc3227.html). This RFC, published in February 2002, remains pertinent today, since core guiding principles don’t change as technologies change. The RFC specifies such principles for evidence collection as capturing as accurate a picture of the system as possible; keeping detailed notes; noting differences between UTC, local time, and system time; and minimizing changes to data as much as possible. We’ll keep these principles in mind throughout our discussion of live response.

Tip::

RFC 3227 points out that you should note the difference between the system clock and universal coordinated time (UTC), as well as take detailed notes in case you need to explain or justify your actions (the RFC says "testify"), even years later.

Of specific interest in this RFC document is Section 2.1, "Order ofVolatility," which lists certain types of volatile information in order, from most to least volatile. Items that are apt to change or expire more quickly due to the passage of time (e.g., processes, network connections, etc.) should be collected first. By contrast, less volatile information, such as the physical configuration of the system, can be collected later. Using these guidelines, we can see what types of information we need to collect from a system, where to look for that information, what tools to use to retrieve it, and even how to get that information off the system, thereby minimizing the impact to the "victim" system while at the same time collecting the information we need to perform our analysis.

When to Perform Live Response

Perhaps the most prominent question on the minds of investigators and first responders is "When should I consider live response?" In most instances today (e.g., criminal or civil cases, internal corporate investigations), no predefined set of conditions defines conditions for live response. In fact, in many situations, live response and, subsequently, volatile information isn’t considered. The decision to perform live response depends on the situation, the environment (taking into consideration the investigator’s intent, corporate policies, or applicable laws), and the nature of the issue with which you have been presented.

Let’s look at a couple of examples. Say you’ve been contacted by a system administrator reporting some unusual network traffic. She received an alert from the IDS, and in checking the firewall logs she found some suspicious log entries that seemed to correlate with the IDS alerts. She says some odd traffic seems to be coming from one particular system that sits on the internal network. She already has the IDS alerts and network logs, but you decide to perform a more comprehensive capture of network traffic. In doing so, you realize that you have the network traffic information, but how do you associate it with a particular system? That’s pretty easy, right? After all, you have the system’s IP address (as either the source or the destination IP address in your network capture), and if you’ve also captured Ethernet frames, you also have the Media Access Control (MAC) address. But how do you then associate the traffic you see on the network with a particular user and/or process running on the system?

To definitively determine the source of the suspicious traffic (which process is generating it), you’d have to collect information about running processes and network connections from the system prior to shutting it down. Other information collected during live response might reveal that someone is logged in to the system remotely, via a network logon or a backdoor, or that a running process was launched as a Scheduled Task.

What other types of situations might suggest or even require a live response? How about the "Trojan defense," in which illicit activity is attributed to a Trojan or backdoor? In October 2002, Julian Green was found to have several (some reports stated more than 170) illicit images on his system. A forensic examination of his system found that his system had several Trojans that would access illicit sites whenever he launched his Web browser. He was found innocent of all charges.

The following year, Aaron Caffrey claimed that Trojans allowed others to control his computer and launch attacks against other systems, for which he’d been accused. Caffrey’s defense argued that although no Trojan had been found on his system during a forensic examination, a Trojan could nevertheless have been responsible. His argument was sufficient to get him acquitted.

In cases such as these, hindsight tells us that it would have been beneficial to have some information about running processes and network connections collected at the time the systems were seized, particularly if they were running when the investigator arrived on the scene. This information might have told us whether any unusual processes were running at the time and whether anyone had connected to the system to control it and upload files, direct attacks against other systems, or the like.

Performing live response means you will be collecting information about the state of systems while they are running, which includes information about processes and the files they are accessing, as well as information about network connections originating from and terminating at the system and which processes are using those network connections. In fact, live response is the only way you will be able to obtain this information, as it all disappears when the system is shut off.

As discussed previously, another reason for performing live response is that the system itself cannot be taken down without good (and I mean really good) reason. On larger critical systems, such as those used in e-commerce, downtime is measured in lost transactions or hundreds (even thousands) of dollars per minute. As the process of acquiring an image from the hard drives (most systems of this nature use more than one hard drive, in a RAID configuration) can often take considerable time, it’s preferable to have some solid facts to justify taking the system offline and out of service, if that is what is necessary. Doing so might not simply be a matter of a system administrator justifying these actions to an IT manager, but one of a CFO justifying them to the board of directors.

Yet another factor to consider is legislation requiring notification. Beginning with California’s SB 1386, companies that suffer security breaches in which personally identifiable information (PII) has been compromised must notify their customers who are California residents so that those customers can protect themselves from identity theft. At the time of this writing, other states have begun to follow California’s lead, and there is even talk of a federal notification law. This means companies that store and process sensitive information cannot simply remain silent about certain types of security breaches.

The term sensitive data really encompasses much more than what is defined in SB 1386; consider also California’s CA 1298, which provides a definition for protected health information (PHI). And regulatory bodies such as the PCI Council provide definitions of "sensitive data," as well (PCI covers credit card data). Not only that, but these regulatory bodies mandate the requirement to protect the data in question and even define some steps for doing so; the PCI Data Security Standard (DSS) Version 1.1 has a requirement (12.9) for a Computer Security Incident Response Plan (CSIRP), as well as a requirement (12.9.2) that the plan be tested annually.

In addition, companies storing and processing sensitive data (regardless of the definition followed) are going to want to know definitively whether sensitive information has been compromised during a security breach, due to the fact that the legislative and regulatory mandates require organizations to notify the individuals whose data was exposed. In some cases, companies that are subject to a breach but are not able to definitively determine what specific data was taken may be required to notify their customers of all available data that may have been exposed. And in most cases, alerting customers that their personal information is now in the hands of some unknown individual (or, as could be the case, multiple unknown individuals) may have a significant, detrimental impact on the company. Customers could discontinue service and tell their friends and family in other states what happened. New customers might decide to sign up with a competitor. The loss of current and future revenue will change the face of the company and could lead to bankruptcy. So, why would a company simply suspect that it has been breached and has had sensitive data stolen and dutifully notify its customers? Wouldn’t the company first want to know for sure that sensitive personal information about its customers has been compromised? Wouldn’t you?

Tools & Traps

Live Response and Sensitive Data

This is a trap that many organizations fall into, only they aren’t aware of it until after they’re in the trap. Frequently, organizations are simply unprepared for an incident, and in a few cases that I’ve seen, some organizations are prepared but their response processes were created by the IT department in complete isolation from any other department. As a result, malware is detected in an organization through some means, and the IT staff springs into action, locating and cleaning infected systems. At a meeting, someone mentions the work performed by the IT staff, and someone from the legal or compliance department hears about this and says, "These six infected systems were located in the area of the company that handles credit card data … was any of that data compromised or exposed?"

Let’s see; the IT staff identified each system, pulled them off the network, ran antivirus scans, perhaps connected to an isolated segment, and ran a complete Windows Update or simply wiped the drive and reloaded the operating system and as much of the user’s data as possible. Was sensitive data on the system? In many cases, we can say "yes". Was the data compromised? At this point, we don’t know and can’t determine the answer to that question, simply because we have no data to analyze. All of the data we would have had disappeared when the system was shut off. In such cases, the IT department’s response has exposed the organization to greater risk than the incident itself, as some regulatory bodies state that unless you can definitively state that sensitive data was not exposed, you must assume that it was, and therefore the organization would be obligated to report that all of the data was exposed.

Besides the "soft costs" of notification due to a data breach, such as losses due to a drop in customer confidence, consider also the "hard costs," those more quantifiable costs such as the actual costs of notifying customers of the exposure of their sensitive information, fines imposed by regulatory bodies, lawsuits brought on as a result of the exposure of the sensitive data, and so forth. Now, compare these to the "costs" associated with actually taking steps to  protect the sensitive data stored and processed by your organization, which includes many of those things mandated by regulatory organizations, such as instituting and annually testing a CSIRP and being able to detect and respond to incidents. Part of this would include the ability to collect the necessary information, through live response, to determine what sensitive data, if any, may have been exposed.

Take, for example, an incident in which an "anonymous" individual on the Internet claims to have stolen sensitive information from an organization. This person claims that he broke into the organization over the Internet and was able to collect customer names, Social Security numbers, addresses, credit card data, and more. The organization’s senior management will want to know whether this was, in fact, the case, and if so, how this person was able to do what he claimed he’d done. Investigators will need to perform live response and examine systems for volatile information, such as running processes and network connections. They might also be interested in locating malware that is present in memory but doesn’t have any information (e.g., log files) or even so much as an executable image written to disk.

Yet another reason for performing live response is the use of this technique to triage an incident. Incident responders are faced not only with larger storage capacities to deal with, but also larger and more dispersed infrastructures. E-commerce application systems may no longer consist of servers in two or three racks in a single data center, but instead may comprise clusters, with the application cluster in one building and perhaps a database cluster in other building. Corporate connectivity no longer consists of a single network segment in a building; rather, some very simple networks span several blocks in a city or even between cities. As such, incident responders need a means by which they can sift through these systems and perform data reduction, determining and prioritizing affected systems. One way to do this is to use live-response techniques to locate artifacts pertinent to the incident, such as a file (or several files) on a system, a running service or process, a Registry key, and so forth.If the thief were found to have used a removable storage device such as an iPod, the entire infrastructure could be swept to determine every system to which that specific iPod had been connected, and when it had last been disconnected from any system. This single scan would greatly reduce the number of systems possibly affected by or involved in the incident from several thousand (or in some cases, several hundred thousand) to only those to which the thief had actually connected the device.

What Data to Collect

At this point, we’re ready to look at the types of volatile information we can expect to see on a live system, and learn about the tools we could use to collect that information during live response.

When you’re performing live response, it’s likely that one of the first things you’ll want to collect is the contents of physical memory, or RAM. When you take Locard’s Exchange Principle into account, it’s pretty clear that by collecting the contents of RAM first, you minimize the impact you have on it. From that point on, you know that the other tools you run to collect other volatile information are going to be loaded into memory (as is the tool that you use to collect the contents of RAM), modifying the contents of memory.

We will discuss the topic of collecting and analyzing the contents of RAM in next topic. Here is a list of the specific types of volatile information we’ll look at in this topic:

■ System time

■ Logged-on user(s)

■ Open files

■ Network information

■ Network connections

■ Process information

■ Process-to-port mapping

■ Process memory

■ Network status

■ Clipboard contents

■ Service/driver information

■ Command history

■ Mapped drives

■ Shares

For each of these types of volatile information, we will look at some tools that we can use to retrieve the information from a Windows system. You will most likely notice that throughout this topic there is a tendency toward using command-line interface (CLI) tools over those with a graphical user interface (GUI). You might think that this is because CLI tools have a smaller "memory footprint," meaning that they consume less memory, rely on fewer dynamic link libraries (DLLs), and have less of an overall impact on the system. This is partially the case, but keep in mind that the actual "footprint" of any particular tool can be determined only through thorough testing of that tool. To date, I am not aware of any such testing being performed and made public.

Warning::

You should never make assumptions about a tool and its "memory footprint" when run on a system. Without thorough examination and testing, you’ll never know the kind of footprint an executable has on a system or the kinds of artifacts it leaves behind following its use.

The primary reason we focus on the use of CLI tools is that they are usually very simple, perform one basic, specific function, and are much easier to automate through the use of batch or script files. CLI tools can be bound together via batch files or scripting languages and their output is usually sent to the console (i.e., STDOUT) and can be redirected to a file or a socket. GUI tools, on the other hand, predominantly require you to save their output to a file, since they pretty much all have a File menu item with Save and Save As entries in the drop-down menu. Most programmers of GUI tools don’t necessarily develop them with incident response or forensics in mind. One of our goals is to minimize the impact of our investigative measures on a system (particularly for follow-on imaging and forensic analysis activities), so we want to avoid writing files to the system, in addition to getting the data we need off the system as quickly and efficiently as possible.

Now, this is not to say that GUI tools absolutely cannot be used for live-response activities. If there’s a GUI tool that you find absolutely perfect for what you need, then by all means, use it. But consider ahead of time how you’re going to get that data off the system.

Regardless of the tools you decide to use, always be sure to check the license agreement before using them. Some tools can be used as you like, but others require a fee for use in a corporate environment. Reading and heeding these agreements in advance can help you avoid major headaches.

Next post:

Previous post: