Copying pages from existing PDF documents (iText 5)

You probably remember the Superman PDF from topic 5. The Hero example imported a plain text file containing PDF syntax into the direct content. I explained that this wasn’t standard practice. If you want to reuse existing content, it’s dangerous to copy and paste PDF syntax like I did in listing 5.14. There are safer ways to import existing content, as you’ll find out in the next example.

In this section, you’ll use an object named PdfImportedPage to copy the content from an existing PDF opened with PdfReader into a new Document written by PdfWriter.

Importing pages

Let’s continue working with the timetable from topic 3. Suppose you want to reuse the pages of this document and treat them as if every page were an image. Figure 6.1 shows how you could organize these imported pages into a PdfPTable. The document in the front of figure 6.1 is created with the code in listing 6.4.

Listing 6.4 ImportingPages1.java

Listing 6.4 ImportingPages1.java

Importing pages from an existing PDF document


Figure 6.1 Importing pages from an existing PDF document

You’ll recognize the five steps in the PDF creation process discussed in part 1. Now you’re also creating a PdfReader object and looping over all the pages, getting PdfImportedPage instances with the getImportedPage() method (as highlighted in bold). What does this method do?

PAGE CONTENT AND RESOURCES

If you browse the API of the PdfReader class, you’ll discover the getPageContent() method, which returns the content stream of a page. This content stream is very similar to what’s inside the hero.txt file. In general, such a content stream contains references to external objects, images, and fonts.

In section 3.4.1, for instance, we examined the PDF syntax needed to draw a raster image:

tmp17C-256

In this snippet, /img0 referred to a key in the /Resources dictionary of the page. The corresponding value was a reference to a stream object containing the bits and bytes of the image. Without the bits and bytes of the image, the PDF syntax referring to /img0 is meaningless.

WARNING It doesn’t make sense to get the content stream of a page from one PDF document, and copy that stream into another PDF without copying all the resources that are needed.

The Hero example was an exception: the syntax to draw the vector image of Superman was self-contained, and this is very unusual. As soon as there’s text involved, you’ll have at least a reference to a font. If you don’t copy that font, you’ll get warnings or errors, such as "Could not find a font in the Resources dictionary." That’s why it’s never advisable to extract a page from PdfReader directly. Instead, you should pass the reader object to the writer class, and ask the writer (not the reader!) to import a page. A PdfImportedPage object is returned. Behind the scenes, all the necessary resources (such as images and fonts) are retrieved and copied to the writer.

FAQ Why are all my links lost when I copy a page with PdfImportedPage? It’s important to understand the difference between resources needed to render the content of a page and the interactive features of a page. In general, these features are called annotations. They include links, text annotations, and form fields. Annotations aren’t part of the content stream. They aren’t listed in the resources dictionary of the page, but in the annotation dictionary. These interactive features aren’t copied when using PdfImportedPage, which means that all interactivity is lost when copying a page with the get-ImportedPage() method of the PdfWriter class.

The PdfImportedPage class extends PdfTemplate, but you can’t add any new content to it. It’s a read-only XObject you can reuse in a document with the method addTem-plate(); or you can wrap it inside an Image. You’ve already used these techniques in section 3.4. The original dimensions of each imported page are the same as the original media box, but in this example, the PdfImportedPages are scaled to fit inside a table. Note that the rotation of the original page isn’t taken into account. If that’s a problem, you’ll have to apply the rotation.

Listing 6.5 ImportingPages2.java

Listing 6.5 ImportingPages2.java

You can see the result in figure 6.1 (the figure in the back). Observe that cell and image rotations go counterclockwise. In the next example, we’ll look at how to apply more transformations.

Scaling and superimposing pages

You can transform pages in iText, just like you can transform images. Do you remember figure 3.2? That was the image I used to explain the different content layers used by iText. I created this image by generating a document with four pages, and then importing those pages into a new one; see figure 6.2.

The imported pages are added to the new PDF document using addTemplate(). The parameters are calculated so that each page is scaled and skewed.

Scaling and skewing pages from an existing PDF

Figure 6.2 Scaling and skewing pages from an existing PDF

Listing 6.6 Layers.java

Listing 6.6 Layers.java

A common technique used with PDF files is called superimposing.

SUPERIMPOSING PDF PAGES

Superimposing means that you add different PDF pages on top of each other on the same page. You could do this with the four pages shown to the left in figure 6.2 to obtain the PDF shown in figure 6.3.

PDF created by superimposing four different pages

Figure 6.3 PDF created by superimposing four different pages

Listing 6.7 Superimposing.java

Listing 6.7 Superimposing.javaListing 6.7 Superimposing.java

Superimposing is often used to create documents with a standard header and footer. IMPORTING COMPANY STATIONERY

Suppose your company has preprinted paper containing the company name and logo in the letterhead, and maybe also a watermark. All letters are printed on this company stationery. You can achieve something similar with PDF, as shown in figure 6.4.

Using an existing PDF as background image for new PDFs

Figure 6.4 Using an existing PDF as background image for new PDFs

In figure 6.4, the PDF to the left is the equivalent of the preprinted paper. When creating a new document, as shown to the right, the template page is imported and added to the background of each new page using a page event.

Listing 6.8 Stationery.java

Listing 6.8 Stationery.java

Listing 6.8 Stationery.java

We’ll conclude the series of PdflmportedPage examples by introducing two more concepts.

N-up copying and tiling PDF documents

When searching for PDF tools on the internet, you’ll find numerous small tools that are designed to meet specific requirements, such as one that creates an A-up layout in a PDF file.

To cut paper costs by 50 percent when printing a PDF document, you can copy an existing PDF into a new one that has half the number of pages. All you have to do is put two pages next to each other on one page. This is called 2-up copying. Figure 6.5 shows the document you created in the previous example in its 2-up, 4-up, 8-up, and 16-up forms.

Most of the tools you can find online have iText on the inside.

A-up copying combines multiple pages onto one page

Figure 6.5 A-up copying combines multiple pages onto one page

tmp89-3_thumb[1]Scaling and tiling a PDF file

Figure 6.6 Scaling and tiling a PDF file

The opposite of W-up copying a PDF file is when you have one page, and you want to print it on different pages; see figure 6.6. We already looked at this in topic 5, but now you’ll do the exercise again using PdflmportedPage.

The next bit of code takes one page from a PDF document and scales it so that the one page is "tiled" over 16 pages.

Listing 6.10 TilingHero.java

Listing 6.10 TilingHero.java

In this section, we’ve been reusing content from existing PDF documents in a new document. You can take digital photocopies of existing pages, scale them up or down, and use them as if they were an image or an XObject.

In the next section, we’re going to take an existing PDF and add extra content.

Next post:

Previous post: