Working with Pdftk - PDF Explained

Graphics Programs Reference

In-Depth Information

Splitting Documents

To take a selection of pages from a document, we use the same syntax as for merging,

because our operation is equivalent to merging just one file with a page range:

pdftk file1.pdf 2-20 output out.pdf

This writes pages 2-20 inclusive to the output file. Pdftk has a separate facility for split-

ting a file into individual pages and writing them all to disk at once, using the burst

operation.

pdftk input.pdf burst

By default, this writes the pages to pg_0001.pdf , pdf_0002.pdf etc. To write them with

differently-formatted names, an output string in the style of the built-in C function

printf may be provided. For example:

pdftk input.pdf burst output page%03d.pdf

would create page001.pdf , page002.pdf etc.

The burst operation also writes the document's metadata to the file doc-data.txt . We

consider this functionality in “Extracting and Setting Metadata” on page 111 .

What Happens when Files are Split

In order to split a PDF into several parts of one or more pages each, a program such as

pdftk would take the following steps:

1. Load and parse the input document into an object graph, possibly lazily (so that

pages which aren't going to appear in any of the output don't have to be processed).

2. Create a new, empty PDF data structure for each new document. Create a new

page tree for each page range, using the same object numbers as the existing

document.

3. Copy all the objects from the input PDF into each output PDF.

4. Remove all objects not required in each PDF (i.e., ones which are no longer

referenced).

To perform the last step correctly, it is important to process bookmarks, destinations,

and other cross-page objects to remove references to pages which no longer appear in

a given output file, since a single errant reference could result in a source file's whole

object graph being included, even though none of it is required.

Search WWH ::

Custom Search

Home