At times during an investigation you may come across a suspicious executable file on which you would like to perform some analysis to get an idea of what it does or what function it performs. Many times, an intruder may leave scripts or configuration files behind, and these files are generally text files that can be opened and viewed. In the case of scripts, some knowledge of programming may be necessary to fully understand the function of the file.
In next topic, we discussed file signature analysis, a method for determining whether a file has the correct file extension based on the file’s type. This is one of the simplest means of obfuscation an attacker uses to hide or mask the presence of files on a compromised system; by changing the filename and extension, the attacker can (many times, correctly) assume that if the administrator discovers the file, she won’t be very eager to access it and determine its true nature if the file has an extension such as .dll.
In this topic, we will discuss ways in which you, as the investigator, can attempt to determine the nature of an executable file. I will present tools and techniques you can use to gather information about an executable file, and get clues about its purpose. This discussion will not be simply about malware analysis; rather, I will present techniques for analyzing executable files in general, of which malware may be just one class of executable file. In this topic, we will discuss several analysis techniques, but we will stop short of any discussion of disassembling the code, or using tools such as IDA Pro (www.hex-rays.com/idapro/).
Before we begin, however, you may be asking why we are doing this. What’s the purpose of analyzing malware files? Isn’t that what antivirus vendors do? Well, very recent history has shown that this may not be entirely the case. For example, the end of 2008 and beginning of 2009 saw the proliferation of the Conficker worm (a.k.a. Downadup) through corporate networks, due in large part to a lack of security updates (specifically Microsoft’s MS08-067 vulnerability). Organizations employing enterprisewide antivirus solutions found themselves at the mercy of the worm when a new, as-yet-unseen variant was released. However, it was a lack of understanding of the nature of malware in general, as well as the nearly complete reliance on an antivirus solution, which caused the most trouble for corporate infrastructures. First responders being able to perform some modicum of analysis immediately will lead to faster response and recovery times for the organization as a whole, which in turn can further assist the response team in determining the root cause of the infection. Quicker response will also lead directly to a decrease in the potential of access to sensitive data by an intruder (via a Trojan, backdoor, or bot) or by the malware itself, as the analysis of the malware will lead to an understanding of its infection vector (how it got in), its persistence mechanism (how it stays running on systems), and any artifacts it may leave. These artifacts can then be used to locate other infected systems, particularly when the malware variant is not recognized by antivirus applications. This capability can mean the difference between taking decisive, reasoned, and informed action now and waiting several days for a vendor representative to show up, collect samples, and then roll out an updated signature file.
That being said, let’s look at how to analyze executable files.
Static analysis consists of collecting information about and from an executable file without actually running or launching the file in any way. When most people open an executable file in Notepad (I’ve done this many times to illustrate something for a client) or even a hex editor, all they see is a bunch of binary data that appears to be meaningless garbage. Now and again, you may see a word that you recognize, but for the most part that word has no context; it could be anything. Investigators need to keep in mind that executable files have to follow certain rules with regard to their format, as there are specific things we can expect to see in an executable file found on a Windows system. Understanding those rules lets us delve into the apparent gobbledygook (aren’t you glad I’m using technical engineering terms as I write this?) of executable files and extract meaningful information.
Before we dig in to an executable file, however, there are a couple of things we need to talk about.
Locating Files to Analyze
One of the questions I am asked fairly frequently is how do you locate malicious or suspicious files on a system or within an image acquired from a system?
As we discussed in next topic, one way to locate these files requires that you collect a memory dump from the system, parse the memory dump with one of the tools mentioned in the topic, and locate a process associated with the suspicious activity using Aaron Walters’ Volatility tool set. Once you find that process and parse the EProcess block, you will have the path to the executable file image. You can then locate that file within the file system in the system image based on that path.
Should you find a suspicious Registry entry—say, in the Run key—you can then simply locate that file within the system image.
Using remote live-response techniques, as discussed in next topic, you can reach out to remote systems to make the necessary queries to search those Registry autostart locations. Another method for doing this is to deploy F-Response on the remote system, and once you have the remote drive connected, use a tool such as RegRipper to collect information from the remote system’s Registry, just as you would had you extracted the hive files for analysis. You can also automate the data extraction by putting the necessary commands to implement rip.exe, the command-line interface (CLI) version of RegRipper, into a batch file.
Yet another means of locating malicious or suspicious executable files on a system image is to mount the image as a read-only drive letter on your analysis system using Smart Mount (www.asrdata.com/SmartMount) or Mount Image Pro, and then scan that drive letter with an antivirus scanning application. In fact, given that there are many instances where actual malicious files are not found by one antivirus application or another, you may want to scan the drive letter with more than one antivirus scanner.
Claus Valca of the Grand Stream Dreams blog (http://grandstreamdreams.blogspot.com/) posts now and again on various antivirus applications that are available for use. Some of the applications he mentions are free but a more fully featured version of the application is available for a fee. However, in many cases, the freely available version provides the ability to scan for malware, which is what we’re really interested in here. Some of the antivirus scanning applications that Claus mentions can be configured or were specifically written to be run from a thumb drive, which allows you to download, update, and use the application without having to have them all restricted to a single system. Go to his blog and search for anti-virus software and malware tools to see a list of the blog articles on those subjects.
However, even with the state of antivirus scanning applications the way they are today, they have one principal drawback: Because these applications are signature-based, all the malware authors need to do is make a small change to their software, and then recompile and redeploy it, and the malware may not be detected. I’ve seen instances (I’ve had a few myself) and heard of others who have submitted malware to various sites (such as VirusTotal. com) for review and found the executable file to not be detected or identified by 35 (or more) different antivirus scanning applications. As such, we need to develop different means by which we can identify malicious executable files, on systems or within system images. Another technique beyond those mentioned already would be to perform a deeper version of file signature analysis; that is, rather than simply looking for the letters MZ in the first two bytes of a potential executable file and then comparing that to the file extension, hoping to find "exe" or "dll" or another valid extension, we should dig a little deeper. Aside from the initial file signature, does the rest of the file have the appropriate file structure for that type of file? You can also verify that files are digitally signed using Microsoft’s sigcheck.exe (http://technet.microsoft.com/en-us/sysinternals/bb897441.aspx), or use WFPCheck, as described in next topic, to attempt to locate files protected by Windows File Protection (WFP) that were replaced or modified.
Regardless of the means you use to locate and identify executable files that may be suspicious or malicious (ideally employing more than one of the aforementioned techniques), you should be sure to thoroughly document what you do, as well as the results of scans or searches.
Documenting the File
Before analyzing or digging into the executable file in any way, the first thing you should do is document it. However, it’s a widely held belief (almost to the point of being an urban legend) that technically oriented folks hate to document anything. Well, this is true, at least in part. I can’t tell you the number of times that I’ve responded to an incident (on-site or remote) and been told by the responders, "We found a file." When asked, "Where did you find the file?" the responders replied with wide-eyed, thousand-yard stares. Where the file was found can be extremely useful in adding context to other information, and helping you figure out what happened.
So, the first thing you need to do is document the full path and location of the file you found; what system it was on, what the complete path to the file was, and who found it and when.
One thing that many technical folks do not seem to realize is that on a computer system (not just on Windows) a file can be named just about anything. Monitor any of the public listservs for a period of time and you’ll find posts where someone will say, "I found this file on my system and a Google search tells me that it’s harmless…" Searching for information about a file based solely on the name of the file can turn up some interesting or useful information, but that information should not be considered the end of the investigation. I responded to an incident once where the on-site information technology (IT) staff had located several files on an infected system, and then Googled for information about each file. Typing in the name of one of the files they found, they saw that the file was legitimate, provided by Microsoft, and they ended their investigation there. However, by examining the file further using techniques presented in this topic, I was able to determine that the file was, in fact, the malware to which I was responding.
Depending on how the suspicious file was originally located, you may already have the documentation for the file available. If you responded to a live system, for example, and used one (or more) of the response techniques mentioned in last topic, it is likely that you already have documentation, such as the full path to the file, available. The same is true if you located the file in a system image using ProDiscover or some other forensic analysis tool or technique.
Another aspect of the file that is important to document is the operating system and version on which it was located. The Windows operating systems vary between versions, and even between Service Packs within the same version. The effect that the malware has on a target may depend on, or even vary depending on, the version of Windows on which it was located. For example, the Teddy Bear virus hoax e-mail identified the jdbgmgr.exe file as being malware (it was referred to as the "Teddy Bear" virus because the icon for this file is a teddy bear) and told the reader to immediately delete the file. If this was done on Windows NT 4.0, the file would be deleted. However, on Windows 2000, WFP would have immediately replaced the file. The set of files protected by WFP differs between Windows 2000 and Windows XP. Back in 2000, Benny and Ratter released the W32.Stream proof of concept virus that made use of NTFS alternate data streams. If the virus made its way onto a Windows system with the file system formatted as FAT/FAT32, the virus appeared to behave differently, but only because the FAT file system does not support ADSs.
Besides noting where within the file system the file was found and on which version of Windows during your response procedures, you should also collect additional information about the file, such as the file’s MAC times and any references to that file within the file system (e.g., shortcuts in a user’s StartUp folder) or Registry, that you may notice during your initial examination.
Investigators need to be very careful when initially approaching a system, particularly one that is still running. Earlier in this topic, we discussed Locard’s Exchange Principle and the fact that ASCII and Unicode text searches do not always work on searches of the Registry, as some values are stored in binary format. Anything an investigator does on a system will leave artifacts on that system, so if you find an unusual file, limit your searches for extra information about the file as much as possible. Any activities you do engage in should be thoroughly documented.
The more complete your documentation, the better. It is a good idea to make a habit of doing this for every investigation, as it will save you a great deal of heartache in the future. Further, this constitutes a "best practice" approach.
Another step you will need to follow to document the file is to calculate cryptographic hashes for the file. Cryptographic hashes are used in information security and computer forensics to ensure the integrity of a file; that is, that no changes have been made to the file. One popular hash algorithm is the MD5 function, which takes input of arbitrary length and produces a 128-bit output hash which is usually represented in 32 hexadecimal characters. Any changes to the input, even switching a single bit, will result in a different MD5 hash. Although deficiencies in the MD5 algorithm that allow for collisions have been noted (http://en.wikipedia.org/wiki/Md5), the algorithm is still useful for computer forensics. Another popular hash algorithm is SHA-1 (http://en.wikipedia.org/wiki/Sha-1). Organizations such as the National Software Reference Library (NSRL) at NIST use the SHA-1 algorithm when computing cryptographic hashes for the Reference Data Set (RDS) CDs. Reference sets such as this allow investigators a modicum of data reduction by filtering out "known-good" (legitimate) and "known-bad" (known malware) files from the data set.
Once you’ve calculated the MD5 hash of an executable file that you think may be malicious in nature, you can go to the VirusTotal.com Web site and post either the file itself or the MD5 hash for review. If you post the executable file for analysis, the site scans the file with about 35 different antivirus scanning applications. If you submit the MD5 hash, it is compared to the database of hashes maintained at the site. This site is a great resource for those with limited access to more than just one or two scanning applications, or for those who’d like to get 34 second opinions.
Another useful hashing algorithm was implemented by Jesse Kornblum in his tool called ssdeep(which is based on spamsum by Dr. Andrew Tridgell), available from http:// ssdeep.sourceforge.net/. Ssdeep.exe computes "context triggered piecewise hashes" (www. dfrws.org/2006/proceedings/12-Kornblum.pdf), which means that instead of computing a cryptographic hash across the entire file start to finish, it computes a hash using a piecewise approach, hashing randomly sized sections (e.g., 4 KB) at a time. Not only does this technique produce a hash that can then later be used to verify the integrity of the original file, but it can also be used to see how similar two files may be. For example, if a Word document is hashed using ssdeep.exe and then modified slightly (adding/removing text, changing formatting, etc.), and then the hash is recomputed, ssdeep.exe will be able to show how similar the files are. You can use this technique with other file types, as well, such as images, videos, and audio files.
Once you’ve documented information about the file, you can begin gathering information from within the file itself.
One of the first steps of static analysis that most investigators engage in is to scan the suspicious file with antivirus software. This is an excellent way to start, but do not be surprised if the antivirus scan comes up with nothing definitive. New malcode is being released all the time. In fact, one antivirus company released a report in January 2007, looking back over the previous year, in which it identified a total of 207,684 different threats that its antivirus product protected against, and 41,536 new pieces of malcode that its product detected. Scanning the suspicious file may provide you with insight as to the nature of the file, but do not be overly concerned if the response you receive is "no virus detected". Scanning with multiple antivirus engines may provide a more comprehensive view of the file, as well.
The next step that most investigators will take with a suspicious executable file is to run it through strings.exe (http://technet.microsoft.com/en-us/sysinternals/bb897439.aspx), extracting all ASCII and Unicode strings of a specific length. This can be very helpful, in that the investigator may get an idea of the nature of the file from the strings within the file. The latest version of strings.exe (as of this writing) allows you to search for both ASCII and Unicode strings, as well as print the offset of where within the file the string is located. This offset will tell you which section the string appears in, and provides context to the string (we will discuss sections and section headers later in this topic). You can even run the strings.exe program to search for specific strings in all files, using the example command line listed at the Web site for the application.
Back "in the day," I was assisting with an investigation of a file taken from a system that was spewing traffic out onto the Internet from within a corporate infrastructure. The file turned out to be the IE0199 virus (www.f-secure.com/v-descs/antibtc.shtml) that would infect a system and start sending traffic to the Bulgarian telecommunications infrastructure. We found ASCII strings within the file that made up a "manifesto," and fortunately someone on our team had received Russian language training in the U.S. Army and was able to interpret what we’d found. Evidently, the author was upset with the prices charged for Internet access in Bulgaria, and wanted to conduct a denial of service (DoS) attack against the infrastructure.
Another useful utility for searching for strings in a binary file is BinText, which used to be available from Foundstone (owned by McAfee, Inc.). BinText would locate all ASCII, Unicode, and resource strings within a binary file and display them within a nice graphical user interface (GUI), along with the offset with the binary file where the string was found. Figure 6.1 illustrates several of the strings found in notepad.exe.
Figure 6.1 Notepad.exe Open in BinText
Although the strings found in the file do not paint a complete picture of what the file does, they can give you clues. Further, the strings may be out of context, other than their location. For example, in Figure 6.1, we see that the strings are Unicode (see the "U" on the left of the interface) and that they appear to be part of the file versioning information (more on this later in this topic). Other strings may not have this same level of context within the file. Another option is that strings that appear odd or unique (in all seriousness, I actually found the string "supercalifragilisticexpialidocious" in a file once; honest) within the file can be used for searches in other files, as well as on the Internet. The results of these searches may provide you with clues to assist in further analysis (either static or dynamic) of the executable file.
A great many Web sites are available on reverse engineering malware or even legitimate applications, and oddly enough, they all point to some of the same core techniques for collecting information from executables, as well as use some of the very same tools. Two of the tools that we’ll be using throughout the next sections of this topic are pedump.exe and peview.exe.
In February 2002, the first of two articles by Matt Pietrek, titled "An In-Depth Look into the Win32 Portable Executable File Format," was published. In these articles, Matt not only described the various aspects of the portable executable (PE) file format in detail, but also provided a CLI tool called pedump.exe (found at www.wheaty.net) that you can use to extract detailed information from the header of a PE file. The information extracted by pedump.exe is sent to STDOUT, so it can be easily viewed at the console or redirected to a file for later analysis.
You can find part 1 of Matt Pietrek’s articles at http://msdn.microsoft.com/en-us/magazine/cc301805.aspx. You can find part 2 at http://msdn.microsoft.com/en-us/magazine/cc301808.aspx.
Another useful tool for exploring the internals of Windows PE files is peview.exe (www.magma.ca/~wjr/), from Wayne Radburn. Peview.exe is a GUI tool that allows you to see the various components of the PE header (and the remaining portions, as well) in a nicely laid out format. The most current version of peview.exe available at the time of this writing is Version 0.96, and that version does not include the ability to save what is viewed in the GUI to a file.
Neither of these tools is provided on the accompanying DVD, due to licensing and distribution issues. Besides, going to the Web sites to obtain the tools will ensure that you have the latest available versions. The DVD does, however, contain Perl code for accessing the PE file structures. The Perl script pedmp.pl uses the File::ReadPE Perl module to access the contents of the PE header and to parse the various structures. The Perl script and module are provided for educational and instructional purposes so that you can see what goes on behind the scenes with the other tools. Also, the Perl code is written to be as platform-independent as possible; that is, when byte values are retrieved from the executable file, the Perl unpack() function is used with unpack strings that force the values into little-endian order. This way, you can run the scripts on Windows, Linux, and even Mac OS X (which is beneficial for analysis, as it is unlikely that on Linux or Mac OS X you will "accidentally" execute Windows malware and infect the system), so you are not restricted to performing analysis on a single platform.