Executable File Analysis (Windows Forensic Analysis) Part 2

The PE Header

At www.microsoft.com/whdc/system/platform/firmware/PECOFF.mspx, Microsoft has thoroughly documented the format of PE files (as well as the Common Object File Format, or COFF, found on VAX/VMS systems), and has made that documentation public. Microsoft has also made most of the structures used within the file headers publicly available, as part of the documentation for the ImageHlp (http://msdn2.microsoft.com/en-gb/library/ms680198.aspx)

API structures. With this and other resources, we can understand the structure of a PE file, delve into its depths, and extract information that may be of use to us during an investigation.

A PE file can be broken down into several areas of interest (I hesitate to say "sections," as we will be using this term for a specific purpose in our discussion). The first, and perhaps most important, part of a PE file (if not the most important, then one of the best bits of geek trivia) is the file signature. For executable files on Windows systems, the file signature consists of the letters MZ, found in the first two bytes of the file. As noted earlier in the topic, these two letters are the initials of Mark Zbikowski (http://en.wikipedia.org/wiki/Mark_Zbikowski), the Microsoft architect credited with designing the executable file format. However, as you’ll see, it takes much more than those two letters and an ".exe" at the end of the file name to make a file executable.


Mark’s initials are the signature for a 64-byte structure called the IMAGE_DOS_ HEADER. The important elements of this structure are the first two bytes (the "magic number" 0x5a4d in little-endian hexadecimal format, or MZ) and the last DWORD (4-byte) value, which is referred to as e_lfanew. This value is defined in the ntimage.h header file as the file address (offset) of the new EXE header; that is, the offset at which we should find the signature for the beginning of the IMAGE_NT_HEADERS structure. The e_lfanew value points to the location of the PE header, enabling Windows to properly execute the image file. Figure 6.2 illustrates these values from an executable file opened in a hex editor.

Figure 6.2 IMAGE_DOS_HEADER Structure Viewed in a Hex Editor

IMAGE_DOS_HEADER Structure Viewed in a Hex Editor

In the example illustrated in Figure 6.2, the IMAGE_NT_HEADERS structure should be located at offset 0xB8 (184 in decimal notation) within the file. The IMAGE_NT_ HEADERS structure consists of a signature and two additional structures, IMAGE_FILE_ HEADER and IMAGE_OPTIONAL_HEADER. The signature for a PE header is, sensibly enough, "PE" followed by two zero values (the signature value is a DWORD, or four bytes in length, and appears as "PE\00\00"), and is illustrated in Figure 6.3.

Figure 6.3 IMAGE_NT_HEADERS Signature Value

IMAGE_NT_HEADERS Signature Value

The IMAGE_FILE_HEADER (http://msdn.microsoft.com/en-gb/library/ms680313.aspx) structure is contained in the 20 bytes immediately following the "PE\00\00" signature, and includes several values that can be useful to investigators. Table 6.1 lists the values and descriptions of the IMAGE_FILE_HEADER structure.

Table 6.1 IMAGE_FILE_HEADER Structure Values

Size

Name

Description

2 bytes

Machine

Designates the architecture type of the computer; the program can be run only on a system that emulates this type

2 bytes

Number of Sections

Designates how many sections (IMAGE_SECTION_HEADERS) are included in the PE file

4 bytes

TimeDateStamp

The time and date that the linker created the image, in UNIX time format (i.e., number of seconds since midnight, 1 Jan 1970). This normally indicates the system time on the programmer’s computer when he compiled the executable

4 bytes

Pointer to Symbol Table

Offset to the symbol table (0 if no COFF symbol table exists)

4 bytes

Number of Symbols

Number of symbols in the symbol table

2 bytes

Size of Optional Header

Size of the IMAGE_OPTIONAL_ HEADER structure; determines whether the structure is for a 32-bit or 64-bit architecture

2 bytes

Characteristics

Flags designating various characteristics of the file

Figure 6.4 illustrates the IMAGE_FILE_HEADER of a sample application opened in PEView.

Figure 6.4 IMAGE_FILE_HEADER Viewed in PEView

IMAGE_FILE_HEADER Viewed in PEView

For forensic investigators, the TimeDateStamp value may be of significance when investigating an executable file, as it shows when the linker created the image file (investigators should also be aware that this value can be modified with a hex editor without having any effect on the execution of the file itself). This normally indicates the system time on the programmer’s computer when the programmer compiled the executable and may be a clue as to when this program was constructed. When performing analysis of the file, the number of sections that are reported in the IMAGE_FILE_HEADER structure should match the number of sections within the file. Also, if the file extension has been altered, the Characteristics value will provide some clues as to the true nature of the file; for instance, within the Characteristics value illustrated in Figure 6.4, if the IMAGE_FILE_DLL flag is set (i.e., 0×2000), the executable file is a dynamic link library (DLL) and cannot be run directly. One class of files that usually occur as DLLs are browser helper objects, or BHOs.These are DLLs that are loaded by Internet Explorer and can provide all manner of functionality. In some instances, these DLLs are legitimate (such as the BHO used to load Adobe’s Acrobat Reader when a PDF file is accessed via the browser), but in many cases these BHOs may be spyware or adware. The MSDN page for the IMAGE_FILE_HEADER provides a list of possible constant values that can comprise the Characteristics field.

The value that gives the size of the IMAGE_OPTIONAL_HEADER structure (http://msdn.microsoft.com/en-gb/library/ms680339.aspx) is important for file analysis, as it tells you whether the optional header is for a 32-bit or a 64-bit application. This value corresponds to the "magic number" of the IMAGE_OPTIONAL_HEADER structure, which is located in the first two bytes of the structure; a value of 0x10b indicates a 32-bit executable image, a value of 0x20b indicates a 64-bit executable image, and a value of 0×107 indicates a ROM image. In our discussion, we will focus on the IMAGE_ OPTIONAL_HEADER32 structure for a 32-bit executable image. Figure 6.5 illustrates the IMAGE_OPTIONAL_HEADER of a sample application viewed in PEView.

Figure 6.5 IMAGE_OPTIONAL_HEADER Viewed in PEView

IMAGE_OPTIONAL_HEADER Viewed in PEView

The values visible in Figure 6.5 indicate that the sample application was designed for the Windows GUI subsystem, and a DLL Characteristics value of 0000 indicates that the sample application is not a DLL.

As you saw earlier, the size of the IMAGE_OPTIONAL_HEADER structure is stored in the IMAGE_FILE_HEADER structure, which contains several values that may be useful for certain, detailed analyses of executable files. This level of analysis is beyond the scope of this topic.

However, a value of interest within the IMAGE_OPTIONAL_HEADER is the Subsystem value, which tells the operating system which subsystem is required to run the image. Microsoft even provides a Knowledge Base article (90493, http://support.microsoft.com/kb/90493) that describes how (and includes sample code) to determine the subsystem of an application. Note that the MSDN page of the IMAGE_OPTIONAL_HEADER structure provides several more possible values for the Subsystem than the Knowledge Base article.

Another value that investigators will be interested in is the AddressofEntryPoint value within the IMAGE_OPTIONAL_HEADER. This is a pointer to the entry point function relative to the image base address. For executable files, this is where the code for the application begins. The importance of this value will become apparent later in this topic.

Immediately following the IMAGE_OPTIONAL_HEADER structure are the IMAGE_ DATA_DIRECTORY (http://msdn.microsoft.com/en-us/library/ms680305.aspx) structures. These data directories, illustrated in Figure 6.6, act as a directory structure for information within the PE file, such as the IMPORT NAME and IMPORT ADDRESS tables (listings of DLL functions that are imported into and used by the executable file), the EXPORT table (for DLLs, the location of functions that are exported), the starting address and size of the Debug directory (http://msdn.microsoft.com/en-us/library/ms680305.aspx), if there is one, and the Resource directory, to name a few (of the 16 possible directories). Each data directory is listed as a relative virtual address (RVA) and size value, and in a specific, defined order.

Figure 6.6 Excerpt of IMAGE_DATA_DIRECTORY Structures Viewed in PEView

00000138

00078004

RVA

IMPORT Table

0000013C

00000028

Size

00000140

0OO7ADDO

RVA

RESOURCE Table

QQG00144

0000114C

Size

00000148

00000000

RVA

EXCEPTION Table

0000014C

00000000

Size

00000150

OOODOODO

Offset

CERTIFICATE Table

00000154

00000000

Size

Figure 6.6 shows four of the 16 data directories available in the sample application. The values listed are the locations or offsets within the PE file where the information is located. For instance, the first line in Figure 6.6 shows that the IMPORT table is located at offset 0×138, the value at that location (0×78004), and the name of the value (RVA). From the information visible in Figure 6.6, we can see that the sample application has both an IMPORT table and a RESOURCE table.

Tip::

An RVA is used within an executable file when an address of a variable (for example) needs to be specified but hardcoded addresses cannot be used. This is because the executable image will not be loaded into the same location in memory on every system. RVAs are used because of the need to be able to specify locations in memory that are independent of the location where the file is loaded. An RVA is essentially an offset in memory, relative to where the file is loaded. The formula for computing the RVA is as follows:

RVA = (Target Address) – (Load Address)

To obtain the actual memory address (a.k.a. the Virtual Address, or VA), simply add the Load Address to the RVA.

The final portion of the PE file that is of interest to us at this point is the IMAGE_ SECTION_HEADER (http://msdn.microsoft.com/en-us/library/ms680341.aspx) structures. The IMAGE_FILE_HEADER structure contains a value that specifies the number of sections that should be in a PE file, and therefore the number of IMAGE_SECTION_HEADER structures that need to be read. The IMAGE_SECTION_HEADER structures are 40 bytes in size, and contain the name of the section (eight characters in length), information about the size of the section both on disk and in memory, and the characteristics of the section (i.e., whether the section can be read, written to, executed, etc.). Figure 6.7 illustrates the structure of an IMAGE_SECTION_HEADER.

Figure 6.7 IMAGE_SECTION_HEADER Viewed in PEView

IMAGE_SECTION_HEADER Viewed in PEView

Tip::

One thing to keep in mind when viewing the section names is that there are no hard and fast requirements as to what section names should or can be. The section name is nothing more than a series of characters (up to eight) that can be anything. Rather than ".text", the section name could be "timmy". Changing the name does not affect the functionality of the PE file. In fact, some malware authors will edit and modify the section names, perhaps to throw off inexperienced malware analysts. Most "normal" programs have names such as .code, .data, .rsrc, or .text. System programs may have names such as PAGE, PAGEDATA, and so forth. Although these names are normal, a malware author can easily rename the sections in a malicious program so that they appear innocuous. Some section names can be associated with packers and cryptors directly. For example, any program with a section name beginning with UPX has been processed using one of those programs. We will discuss this at greater length later in this topic.

All of the PE file information is also available via pedump.exe. The section information in Figure 6.7 appears as follows when viewed via pedump.exe:

tmp1E1-216

As you can see, there is no significant difference in the information available via the two tools. The virtual size and address information determines how the executable image file will "look" when in memory, and the "raw data" information applies to the executable image file as it exists on disk.

IMPORT Tables

It’s very rare these days that an application is written completely from scratch. Most programs are constructed by accessing the Windows application program interface (API) through various functions made available in libraries (DLLs) on the system. Microsoft provides a great number of DLLs that offer access to ready-made functions for creating windows, menus, dialogs, sockets, and just about any widget, object, and construct on the system. There is no need to create any of these completely by hand when creating an application or program.

That being the case, when programs are written and then compiled and linked into executable image files, information about the DLLs and functions accessed by that program needs to be available to the operating system when the application is running. This information is maintained in the IMPORT table and the IMPORT ADDRESS table of the executable file.

Note

Awhile back, I had the opportunity to work on a project that involved determining whether an executable file had network capabilities. I had done some work examining applications to determine whether they had capabilities of either a network server (listened for connections, like a Trojan backdoor) or client (made connections to servers, like an IRCbot), but with this project the goal was to automate the process. So, we started by examining available DLLs to determine which of them provided networking functionality (i.e., wininet.dll, ws2_32.dll, etc.), and then we determined which functions provided the core functionality in question. Once we had that information, we could automate the process by parsing the PE file structures, locating the IMPORT table and determining which DLLs and functions were used. One thing to keep in mind, however, is that reading the IMPORT table of a malware executable file may not be that easy if the file is obfuscated in some manner.

The pedump.exe tool provides easy access to the IMPORT table information, by locating the import data directory and parsing the structures to determine the DLLs and the functions the application uses. Example output from pedump.exe appears as follows: Import Table:

tmp1E1-217

As you can see, the sample application imports several functions from kernel32.dll. Although the DLL actually provides a number of functions that are available for use (see the "EXPORT Table" section later in this topic), this example executable imports functions such as GetSystemTimeAsFileTime() and CreateFileA() for use. Microsoft provides a good deal of information regarding many of the available functions, so you can do research online to see what various functions are meant to do. For example, the GetSystemTimeAsFileTime( ) function retrieves the current system time as a 64-bit FILETIME object, and the returned value represents the number of 100-nanosecond intervals since 1 Jan 1601, in Universal Coordinated Time (UTC) format.

Tip::

You can look up Microsoft API functions via MSDN. I keep a link to the Microsoft Advanced Search page on my browser toolbar for quick access. Typing in the name of the function I’m interested, such as GetSystemTimeAsFileTime, provides me not only with information about the API function, but also with important ancillary information.

Seeing what functions an application imports gives you a general clue as to what it does (and does not do). For example, if the application does not import any of the DLLs that contain networking code, either as low-level socket functions or higher-level Internet APIs, it is unlikely that the application is a backdoor or that it can be used to transmit information off the system and onto the Internet. This is a useful technique, one that I have used to provide information and answer questions about an application. I was once given an executable image and asked whether it was or had the capability of being a network backdoor. After documenting the file, I took a look at the IMPORT table and saw that none of the imported DLLs provided networking capabilities. I took my analysis a step further by looking at the functions that were imported and found that although several provided mathematic functionality, none provided networking capability.

Another useful tool for viewing the information regarding DLLs and functions required by an application is the Dependency Walker tool, also known as depends.exe, available from the Web site of the same name. Figure 6.8 illustrates an excerpt of the Dependency Walker GUI, with the sample application dcode.exe open in the Dependency Walker.

Figure 6.8 Excerpt from Dependency Walker GUI

Excerpt from Dependency Walker GUI

As illustrated in Figure 6.8, the dcode.exe application relies on functions from MSVBVM60.DLL, which in turn relies on functions from six other DLLs (each DLL ships with the most current Windows distributions). Figure 6.9 illustrates a portion of the functions exported by MSVBVM60.DLL, as reported by the Dependency Walker tool.

Figure 6.9 Functions Exported by MSVBVM60.DLL

Functions Exported by MSVBVM60.DLL

The Dependency Walker tool allows you to see not only the DLLs and functions that an executable imports—be it an .exe or a .dll file—but also the functions exported by DLLs. We will discuss the EXPORT table a bit more in the next section.

The Dependency Walker tool also has a useful profiling function, which allows you to set specific parameters for how a module or application will be profiled, and then to launch the application to see which modules (DLLs) will be loaded. This allows you to trace the various DLL function calls and returned values as the application runs. This can be useful in detecting modules that are dynamically loaded but aren’t listed in the IMPORT tables of other modules, or for determining why an "application failed to initialize properly" error is reported. However, this falls outside the scope of static analysis, as it requires the file to be run.

Next post:

Previous post: