Graphics Programs Reference
In-Depth Information
Splitting Documents
To take a selection of pages from a document, we use the same syntax as for merging,
because our operation is equivalent to merging just one file with a page range:
pdftk file1.pdf 2-20 output out.pdf
This writes pages 2-20 inclusive to the output file. Pdftk has a separate facility for split-
ting a file into individual pages and writing them all to disk at once, using the burst
operation.
pdftk input.pdf burst
By default, this writes the pages to pg_0001.pdf , pdf_0002.pdf etc. To write them with
differently-formatted names, an output string in the style of the built-in C function
printf may be provided. For example:
pdftk input.pdf burst output page%03d.pdf
would create page001.pdf , page002.pdf etc.
The burst operation also writes the document's metadata to the file doc-data.txt . We
consider this functionality in “Extracting and Setting Metadata” on page 111 .
What Happens when Files are Split
In order to split a PDF into several parts of one or more pages each, a program such as
pdftk would take the following steps:
1. Load and parse the input document into an object graph, possibly lazily (so that
pages which aren't going to appear in any of the output don't have to be processed).
2. Create a new, empty PDF data structure for each new document. Create a new
page tree for each page range, using the same object numbers as the existing
document.
3. Copy all the objects from the input PDF into each output PDF.
4. Remove all objects not required in each PDF (i.e., ones which are no longer
referenced).
To perform the last step correctly, it is important to process bookmarks, destinations,
and other cross-page objects to remove references to pages which no longer appear in
a given output file, since a single errant reference could result in a source file's whole
object graph being included, even though none of it is required.
Search WWH ::




Custom Search