PROGRAMMING - Hacking: The Art of Exploitation

Graphics Programs Reference

In-Depth Information

Like a row of houses on a local street, each with its own address, memory

can be thought of as a row of bytes, each with its own memory address. Each

byte of memory can be accessed by its address, and in this case the CPU

accesses this part of memory to retrieve the machine language instructions

that make up the compiled program. Older Intel x 86 processors use a 32-bit

addressing scheme, while newer ones use a 64-bit one. The 32-bit processors

have 2 32 (or 4,294,967,296) possible addresses, while the 64-bit ones have 2 64

(1.84467441 × 10 19 ) possible addresses. The 64-bit processors can run in

32-bit compatibility mode, which allows them to run 32-bit code quickly.

The hexadecimal bytes in the middle of the listing above are the machine

language instructions for the x 86 processor. Of course, these hexadecimal values

are only representations of the bytes of binary 1s and 0s the CPU can under-

stand. But since 0101010110001001111001011000001111101100111100001 . . .

isn't very useful to anything other than the processor, the machine code is

displayed as hexadecimal bytes and each instruction is put on its own line,

like splitting a paragraph into sentences.

Come to think of it, the hexadecimal bytes really aren't very useful them-

selves, either—that's where assembly language comes in. The instructions on

the far right are in assembly language. Assembly language is really just a col-

lection of mnemonics for the corresponding machine language instructions.

The instruction ret is far easier to remember and make sense of than 0xc3 or

11000011 . Unlike C and other compiled languages, assembly language instruc-

tions have a direct one-to-one relationship with their corresponding machine

language instructions. This means that since every processor architecture has

different machine language instructions, each also has a different form of

assembly language. Assembly is just a way for programmers to represent the

machine language instructions that are given to the processor. Exactly how

these machine language instructions are represented is simply a matter of

convention and preference. While you can theoretically create your own x 86

assembly language syntax, most people stick with one of the two main types:

AT&T syntax and Intel syntax. The assembly shown in the output on page 21

is AT&T syntax, as just about all of Linux's disassembly tools use this syntax by

default. It's easy to recognize AT&T syntax by the cacophony of % and $ symbols

prefixing everything (take a look again at the example on page 21). The same

code can be shown in Intel syntax by providing an additional command-line

option, -M intel , to objdump , as shown in the output below.

reader@hacking:~/booksrc $ objdump -M intel -D a.out | grep -A20 main.:

08048374 <main>:

8048374: 55 push ebp

8048375: 89 e5 mov ebp,esp

8048377: 83 ec 08 sub esp,0x8

804837a: 83 e4 f0 and esp,0xfffffff0

804837d: b8 00 00 00 00 mov eax,0x0

8048382: 29 c4 sub esp,eax

8048384: c7 45 fc 00 00 00 00 mov DWORD PTR [ebp-4],0x0

804838b: 83 7d fc 09 cmp DWORD PTR [ebp-4],0x9

804838f: 7e 02 jle 8048393 <main+0x1f>

Hacking: The Art of Exploitation

Search WWH ::

Custom Search

Home