Graphics Programs Reference
In-Depth Information
CHAPTER 3
File Structure
In this chapter, we describe the layout and content of the PDF file's four main sections,
and the syntax of the objects which make up each one. We also outline the process of
reading a PDF file into a high level data structure, and the converse operation of writing
that structure to a PDF file.
File Layout
A simple valid PDF file has four parts, in order:
1. The
header
, which gives the PDF version number.
2. The
body
, containing the pages, graphical content, and much of the ancillary in-
formation, all encoded as a series of
objects
.
3. The
cross-reference table
, which lists the position of each object within the file, to
facilitate random access.
4. The
trailer
including the
trailer dictionary
, which helps to locate each part of the
file and lists various pieces of metadata which can be read without processing the
whole file.
For reference, we reproduce the “Hello, World” PDF from
Chapter 2
as
Example 3-1
.
The first line of each of the four sections has been annotated.
Example 3-1. A small PDF file
%PDF-1.0
Header starts here
%âãÏÓ
1 0 obj
Body starts here
<<
/Kids [2 0 R]
/Count 1
/Type /Pages
>>
endobj
2 0 obj
<<