Embedding files into a PDF Part 2 (iText 5)

PDF packages, portable collections, or portfolios

Suppose that you want to bundle a set of documents that belong together into one PDF, and organize them in a way that the attachment panel can’t accommodate. Suppose you want to add your own keys, and to allow the end user to sort the entries in the collection of documents based on those custom keys.

This functionality was introduced in PDF 1.7, and it’s known under different names. People working with it on the lowest level will talk about portable collections, because that’s the name that is used in the PDF reference and in ISO-32000-1. People who work on a higher level using Adobe Acrobat or Adobe Reader will say that a PDF document as shown in figure 16.4 is a portfolio. And if you ever hear people talk about PDF packages, that’s the original name of this functionality.

A portable collection containing PDF files

Figure 16.4 A portable collection containing PDF files

Figure 16.4 shows a collection of PDF files with information about the movies of Stanley Kubrick. The end user gets an overview with the year the movie was made, the movie title, the run length, and the file size. The user can also sort the entries based on these fields. Clicking one of the lines in the overview opens the file.


The fields in this UI are defined in a collection schema dictionary. This dictionary consists of a variable number of individual collection field dictionaries. The next listing shows how to create these dictionaries.

Listing 16.9 KubrickMovies.java

Listing 16.9 KubrickMovies.javaListing 16.9 KubrickMovies.java

In listing 16.9, you create five PdfCollectionField objects. The constructor of this class accepts a name that will be used as the caption of a column in the detail view of the collection. It also expects a field type, which must be one of values listed in table 16.2.

Table 16.2 Collection field types

Parameter

Name

Description

tmp404-458_thumb[1] tmp404-459_thumb[1]

The field value will contain text; iText will use the object PdfString internally.

tmp404-460_thumb[3] tmp404-461_thumb

The field value will contain a date; iText will use the object PdfDate internally.

tmp404-462_thumb[1] tmp404-463_thumb[1]

The field value will contain a number; iText will use the object PdfNumber internally.

tmp404-464_thumb tmp404-465_thumb

The value will be obtained from the /UF entry in the file specification.

tmp404-466_thumb[1] tmp404-467_thumb[1]

The value will be obtained from the /Desc entry in the file specification.

tmp404-468_thumb tmp404-469_thumb

The value will be obtained from the /ModDate entry in the file specification.

tmp404-470_thumb[1] tmp404-471_thumb[1]

The value will be obtained from the /CreationDate entry in the file specification.

tmp404-472_thumb[1] tmp404-473_thumb[1]

The size of the embedded file as identified by the /Size entry in the /Params dictionary of the stream dictionary of the embedded file.

You can set the order of the fields in the UI with the setOrder() method. Observe that in listing 16.9 you set one field invisible with setVisible(false). As a result, there’s no column with that filename in figure 16.4. The default is true; all other fields are visible. Finally, you can make the field editable with the setEditable() method. By default, fields are not editable.

NOTE If the collection schema is absent, the Reader will choose useful defaults taken from the file specification dictionary, such as the filename and the file size.

The collection schema is used in the collection dictionary of the PDF document. You construct a PdfCollection dictionary with one of the following preferences as a parameter:

■ DETAIL—The collection view is presented in detail mode, with all information in the schema dictionary presented in a multicolumn format. This mode provides the most information to the user. See figure 16.4.

■ TILE—The collection view is presented in tile mode, with each file in the collection denoted by a small icon and a subset of information from the schema dictionary. This mode provides top-level information about the file attachments to the user. See figure 16.5.

■ HIDDEN—The collection view is initially hidden, without preventing the user from obtaining a file list via explicit actions.

CUSTOM—The collection view is presented by a custom navigator. This option isn’t described in ISO-32000-1, but in Adobe’s extensions to ISO-32000-1 (level 3).

The end user can always switch from the initial view to another view.

The files presented in the UI can be sorted in different ways, and you can define the sort order using a PdfCollectionSort object. You construct this object by passing the name of a field that has to be used to sort the items as a parameter. With the set-SortOrder() method, you can sort the items in ascending (true) or descending (false) order. If you want to involve multiple fields, you have to pass an array of field names as a parameter of the PdfCollectionSort constructor as well as a corresponding array of Boolean values for the sort order.

Each collection has a cover page. In listing 16.10, the cover page has the text, "This document contains a collection of PDFs, one per Stanley Kubrick movie." But when you open the document, you’ll see a different page because you’ve used the setIni-tialDocument() method to choose one of the embedded files as the initial page.

Once you’ve completed setting all the parameters of the PdfCollection dictionary, you can use setCollection() as is done here.

Listing 16.10 KubrickMovies.java

Listing 16.10 KubrickMovies.java Listing 16.10 KubrickMovies.java

As soon as there are fields of type TEXT, DATE, or NUMBER in the collection schema, you need to create a PdfCollectionItem for each file specification. This class comes with a plethora of addItem() methods that allow you to set the values of the different fields present in the collection schema.

NOTE If you sorted the collection shown in figure 16.4 alphabetically in ascending order based on the titles, you’d want the movie A Clockwork Orange to follow Barry Lyndon, and not the other way around. To achieve this, you need to pass the string "Clockwork Orange" with the addItem() method and the article "A" with the setPrefix() method. The title would be shown as A Clockwork Orange, but the sorting order wouldn’t be affected by the article "A".

You’ve created your first portable collection. If you open it in Adobe Reader, there will be an extra entry named Portfolio in the View menu. You can use it to switch to another UI, such as from a detailed view to a tiled view, or to return to the cover page.

Figure 16.5 shows a second portable collection opened in tiled view. As you can see, some of the PDFs created in this section have been bundled along with a JPEG and a plain text file. The image was created using the following listing.

A portable collection containing different file types

Figure 16.5 A portable collection containing different file types

Listing 16.11 KubrickCollection.java

Listing 16.11 KubrickCollection.java

If the file type is supported by the viewer, the end user will be able to view the file directly. This is the case for the JPEG and the plain text file in figure 16.5. You can choose to open these files in an external application too. That’s also an option for file types that can’t be opened in the viewer, unless special permissions are set to avoid security hazards.

This second portfolio example, named KubrickCollection, was written to demonstrate nested /GoToE actions. The file kubrick_movies.pdf shown in figure 16.5 is the collection you created with the KubrickMovies example. The following listing adds links from the cover page of the collection to the files embedded in a file that is part of the collection.

Listing 16.12 KubrickCollection.java

Listing 16.12 KubrickCollection.java

The final target is a movie page that is the child of an intermediate target, namely the first attachment on page 2, which is the page with index 1. The next bit of code shows how this attachment was added.

Listing 16.13 KubrickCollection.java

Listing 16.13 KubrickCollection.java

In this code snippet, we have another example of a /GoToE action, demonstrating the use of the setFileAttachmentPagename() and setFileAttachmentName() methods as alternatives for setFileAttachmentPage() and setFileAttachmentIndex(). But the main reason to look at this snippet is the final line: writer.addFileAttachment(fs);.

The kubrick_movies.pdf file is added as an attachment annotation. Internally, this annotation will appear in the /Annots array of the page dictionary. These file attachment annotations do not appear in the list of embedded files and are therefore not a part of the portable collection, unless you also add them as document-level attachments.

Don’t worry, the bits and bytes of the file will only be present once inside the PDF file. The file specification will be referenced from two places: from a file attachment annotation on the page level, and from the /EmbeddedFiles name tree at the document level.

If you’ve experimented with the examples while reading this topic, you’ve probably noticed that the files with the movie information that were embedded in the PDF named kubrick_movies.pdf contain a "Go to original document" link that doesn’t work. This link is created with this listing:

Listing 16.14 KubrickMovies.java

Listing 16.14 KubrickMovies.java

This creates a link to the parent of a parent. It’s normal that this link doesn’t work in the context of the standalone kubrick_movies.pdf file, because there’s no grandparent. This link will only work when the file with the movie information is opened in the context of the kubrick_collection.pdf file in which the kubrick_movies.pdf file is embedded. While it’s fun to make constructions like this, you shouldn’t confuse the end user by making the family structure of embedded files and embedded goto actions too complex.

Let’s move on and look at special types of annotations that allow you to add movies, sound, and other multimedia formats as part of a document.

Next post:

Previous post: