Creating a PDF document in five steps with iText

Let’s copy the content of the main method of figure 1.5, and remove the comments. The numbers to the side in this listing indicate the different steps in the PDF-creation process.

Listing 1.1 HelloWorld.java

In each of the following subsections, we’ll focus on one specific step. You’ll apply small changes to step O in the first subsection, to step C in the second, and so on. This way, you’ll create several new documents that are slightly different from the one in figure 1.4. You can hold these variations on the original hello.pdf against a strong light (literally or not) and discover the differences and similarities caused by the small code changes.

Creating a new Document object

Document is the object to which you’ll add content in the form of Chunk, Phrase, Paragraph, and other high-level objects. These objects are often referred to as iText’s basic building blocks, and they’ll be discussed in topic 2. For now, we’ll only work with Paragraph objects.

MEASUREMENTS

Upon creating the Document object, you’ll define the page size and the page margins of the first page. Either this happens implicitly, as is the case in step O of listing 1.1; or you can define the size and margins explicitly using a com.itextpdf.text.Rectangle object and four float values for the margins as shown here.

Listing 1.2 HelloWorldNarrow.java

In this example, a rectangle measuring 216 x 720 user units is created. This rectangle is used as the page size in the Document constructor, along with a left margin of 36 user units, a right margin of 72 user units, a top margin of 108 user units, and a bottom margin of 180 user units.

FAQ What is the measurement unit in PDF documents? Most of the measurements in PDFs are expressed in user space units. ISO-32000-1 (section 8.3.2.3) tells us “the default for the size of the unit in default user space (1/72 inch) is approximately the same as a point (pt), a unit widely used in the printing industry. It is not exactly the same; there is no universal definition of a point.” In short, 1 in. = 25.4 mm = 72 user units (which roughly corresponds to 72 pt).

If you open the document created by listing 1.2 in Adobe Reader and look at the Description tab in the Document properties dialog box (opened via File > Properties), you’ll find that the document measures 3 in. x 10 in.

iText also created a left margin of 0.5 in. (36/72), a right margin of 1 in. (72/72), a top margin of 1.5 in. (108/72), and a bottom margin of 2.5 in. (180/72).

If you don’t like doing all that math, there’s a Utilities class in iText with static methods that help you switch among points, inches, and millimeters: millimeters-ToPoints(), millimetersToInches(), pointsToMillimeters(), pointsToInches(), inchesToMillimeters(), and inchesToPoints(). All these methods expect a float as their value.

Listing 1.3 HelloWorldMaximum.java

Looking at the first line in this code snippet, you might expect a document with a page measuring 200 in. x 200 in., but when you look at the document properties of the resulting file, you’ll see that it measures 15,000,000 in. x 15,000,000 in. That’s because you’ve changed the user unit to 75,000 in the last line of listing 1.3. Now, one user unit corresponds with 75,000 points, and you’ve created a PDF document with the largest possible page size.

PAGE SIZE

Theoretically, you could create pages of any size, but the PDF specification imposes limits depending on the PDF version of the document.

Table 1.1 Minimum and maximum size of a page depending on the PDF version

PDF version	Minimum size	Maximum size
PDF 1.3 or earlier	72 x 72 units (1 in. x 1 in.)	3240 x 3240 units (45 in. x 45 in.)
PDF 1.4 and later	3 x 3 units (approximately 0.04 in. x 0.04 in.)	14,400 x 14,400 units (200 in. x 200 in.)

Changing the user unit has been possible since PDF 1.6. The minimum value of the user unit is 1 (this is the default; 1 unit = 1/72 in.); the maximum value is 75,000 points (1 unit = 1042 in.).

But enough about exotic page sizes; you’re probably interested in the standard paper sizes. The default value of a page in iText, if you create a Document object without any parameters, is A4, which is the most common paper size in Europe, Asia, and Latin America. It’s specified by the International Standards Organization (ISO) in ISO-216. An A4 document measures 210 mm x 297 mm, or 8.3 in. x 11.7 in., or 595 pt x 842 pt.

If you want to create a document in another standard format, take a look at the PageSize class. This class was written for your convenience, and it contains a list of static final Rectangle objects, offering a wide selection of standard paper sizes, including A0 to A10, B0 to B10, and the American standard sizes: LETTER, LEGAL, LEDGER, and TABLOID. Listing 1.4 shows how to adapt the initial HelloWorld example so that it produces a PDF document saying “Hello World!” on a page that’s the American letter paper size.

Note that these methods refer to points, not to user units. That’s because the default value of the user unit corresponds with a point, but it’s possible to change this default.

Listing 1.6 HelloWorldLandscape2.java

The results of both landscape examples look exactly the same in Adobe Reader. The Reader’s Description tab doesn’t show any difference in size. Both PDF documents have a page size of 11 in. x 8.5 in. (instead of 8.5 in. x 11 in.), but there are subtle differences internally:

■ In the first file, the page is defined with a size that has a width smaller than the height, but with a rotation of 90 degrees.

■ The second file has the page size you defined without any rotation (a rotation of 0 degrees).

This difference will matter when you want to manipulate the PDF. We’ll return to this issue in topic 6.

PAGE MARGINS

In listing 1.2, you defined margins using the constructor of the Document object, and you added a Paragraph to it. In the next two examples, you’ll define the page size and margins using the setPageSize() and setMargins() methods. You can use these methods at any time in the document’s creation process, but be aware that the change will never affect the current page, only the next page.

In these examples, you’ll add paragraphs that are aligned on both sides—justified text—so you can clearly see the left and right margins. You’ll add enough paragraphs to cause a page break, so you can make sure the bottom margin is respected.

Suppose this document consists of pages that are to be printed on both sides, and bound into a topic. Depending on the way the topic is bound, you might want a larger or smaller margin on the inner edges of the pages: the left margin of an odd-numbered page should correspond to the right margin of an even-numbered page. The same goes for the opposite margins. In short, you want the margins to be mirrored.

Listing 1.4 HelloWorldLetter.java

The orientation of most of the paper sizes defined in PageSize is portrait. You can change this to landscape by invoking the rotate() method on the Rectangle.

Listing 1.5 HelloWorldLandscapel.java

Another way to create a Document in landscape orientation is to create a Rectangle object with a width that is greater than its height.

Listing 1.7 HelloWorldMirroredMargins.java

Listing 1.7 assumes that the spine of the topic is to the left (for Western books) or to the right (for Japanese books). But some books are bound in a completely different way, with the spine of the topic at the top or bottom of the pages. In that case, you’d need to use this method.

Listing 1.8 HelloWorldMirroredMarginsTop.java

Now the top and bottom margins are mirrored instead of the left and right margins.

But maybe we’re getting ahead of ourselves. We’re already adding content, but we haven’t yet discussed step C in listing 1.1 in the PDF creation process.

Getting a PdfWriter instance

PdfWriter is the class responsible for writing the PDF file. You can also add contents, such as annotations, to PdfWriter. As opposed to the high-level objects added to the Document object, manipulations on PdfWriter are often referred to as low-level access and writing to the direct content. You’ll find out more about these concepts in topic 3.

Step C in listing 1.1 in the PDF creation process combines two actions:

It associates a Document with the PdfWriter. This writer will “listen” to the document. High-level objects, such as a Paragraph, will be translated into low-level operations. For example, iText will generate the PDF syntax that draws the textual content of a paragraph at a specific position on a page, taking into account the page size and margins.

■ It tells the PdfWriter to which OutputStream the file should be written. In the previous examples, you have written the content to a FileOutputStream, but you could have written to any other type of OutputStream. You could even have written the bytes of a PDF file to System.out.

In rare circumstances, creating a writer instance can cause a DocumentException.

EXCEPTIONS

DocumentException is the most general exception in iText. It can occur in step C or step Q of listing 1.1. For example, if you try adding a Paragraph before you’ve done step ©, you’ll get the following error message: “The document isn’t open yet; you can only add metadata information.” DocumentExceptions also occur when manipulating existing documents. For instance, “Append mode requires a document without errors even if recovery was possible.”

If you look at listing 1.1, you see that you can also expect an IOException. Once you start using resources such as images, fonts, or existing PDFs, this exception can occur if something goes wrong while reading from an InputStream.

In the examples we’ve looked at so far, the only IOException that could be thrown is a FileNotFoundException. This happens when you’re trying to create a hello.pdf file, but you already have a file with that name opened—and locked—in Adobe Reader. (This happened to me all the time while writing the examples for this topic.) Or maybe you’re trying to create the file in the results/part1/topic01 directory, but this directory doesn’t exist on your filesystem. The empty results directories are provided with the example archives to avoid this problem.

OTHER OUTPUTSTREAMS

While you’re adding content to the Document, the PdfWriter gradually writes a PDF file to the OutputStream. This PDF file will be written to a file on disk if you choose a File-OutputStream. In a web application, you’ll generally prefer serving the PDF to a web browser without saving it on the server, so you could write directly to the Servlet-OutputStream, using response.getOutputStream() in your servlets. This will work with some browsers, but unfortunately not with all. Topic 9 will explain why it’s better to write the complete file to memory before transferring the bytes to the OutputStream of an HttpServletResponse object.

Here’s how to write a file to memory using a ByteArrayOutputStream.

Listing 1.9 HelloWorldMemory.java

Observe that the PDF is created in memory in the first part of this snippet; nothing is written to disk. The bytes are written to a file in the last three lines of the snippet to prove that what was generated in memory represents a valid PDF file.

Now that you have all the infrastructure in place, it’s time to open the Document.

Opening the Document

Java programmers may not be used to having to open streams before being able to add content. When you create a new stream in Java, you can start writing bytes, chars, and Strings to it right away. With iText, it’s mandatory to open the document first.

When a Document object is opened, a lot of initializations take place, and the file header is written to the OutputStream.

THE FILE HEADER AND THE PDF VERSION

Figure 1.6 shows your first PDF file, hello.pdf, opened in the Notepad++ text editor.

Figure 1.6 hello.pdf opened in Notepad++

This is the header of a PDF file. The structure of a PDF file, with its header, body, cross-reference table, and footer, will be discussed in great detail in topic 13. For now, it’s sufficient to know that the first line gives you an indication of the PDF version that is used.

By default, iText uses version 1.4, which was introduced in 2001. If you introduce functionality newer than what’s available in PDF 1.4 after step G in listing 1.1, it’s your responsibility to set the correct PDF version before step ©. Otherwise, the default version—PDF-1.4—will be written to the OutputStream, and there’s no going back.

NOTE Beginning with PDF 1.4, the PDF version can also be stored elsewhere in the PDF (in the root object of the document, aka the catalog; see topic 13). This implies that a file with header PDF-1.4 can be seen as a PDF 1.6 file if it’s defined that way in the document root.

In some cases, iText changes the PDF version automatically. In listing 1.3, you changed the user unit, and this capability was introduced in version 1.6 of the PDF specification. Because you changed the user unit before step ©, iText was able to update the PDF version in the header to %PDF-1.6.

It’s a better practice to set the version number with PdfWriter.setPdfVersion() if you use PDF features that are newer than what was available in PDF 1.4. Here’s how to change the PDF version to 1.7.

As you can see, the first lines look like this:

Listing 1.10 HelloWorldVersion_1_7.java

It’s not forbidden for the PDF version in the header to be different from the PDF version in the catalog, but it’s good practice to make setting the PDF version a part of your initializations to avoid ambiguity.

INITIALIZATIONS

Document.open() also performs many initializations. For instance, you can’t access the outline of the bookmarks before the document has been opened. If you want to create an encrypted PDF file, you must set the encryption type, strength, and permissions before step C in listing 1.1.

FAQ I have set feature X, and it doesn’t work, or it doesn’t work for page 1, only for the pages that follow. Why is that? Many settings, such as the page size and margins, only go into effect on the next page. This may seem trivial, but it’s a common question for new iText users. If you want the feature to work on page 1, define it before opening the document.

After step ©, the first page of our document is available for you to add content (step O).

Adding content

In this section, we’re creating simple Hello World PDF documents, learning the elementary mechanics of iText’s PDF creation process. Once these are understood, you can start generating real-world documents containing real-world data.

To learn how to implement step O, you’ll copy steps O, C, ©, and Q from listing 1.1 into an application, then focus on step O: adding content to the PDF document.

There are different ways to add content. Up until now, you’ve been adding one or more high-level objects of type Paragraph to the Document. In the next topic, you’ll learn about other objects, such as Chunk, Phrase, Anchor, and List. You can also add content to a page using low-level methods.

DIRECT CONTENT

Listing 1.11 shows a variation on this topic’s initial “Hello World” example. Although this is a rather complex example for a first topic about using iText, it will give you an idea of iText’s internal PDF-creation process.

Listing 1.11 HelloWorldDirect.java

Steps O, ©, and O are the same as they were in listing 1.1, but you need to make a small change to step C. Instead of using an unnamed instance of PdfWriter, you now give it a name: writer. You need this instance because you want to grab a canvas on which you can draw lines and shapes, and, in this case, text. In listing 1.11, comment sections were added, reflecting the PDF syntax that is written by each method.

By using the setCompressionLevel() method with a parameter of 0, you avoid compressing the stream. This allows you to read the PDF syntax when opening the file in a text editor. Figure 1.7 shows the resulting PDF when opened in WordPad.

This screenshot contains less gibberish than figure 1.6, though it’s showing the syntax of a similar “Hello World” PDF. You’ll recognize the PDF header, followed by a PDF object with number 2: 2 0 obj. After reading part 4 of this topic, you’ll understand that this object is a stream object, the content stream of the first page. In figure 1.6, the content stream was compressed, but in figure 1.7, the compression is zero. You can see the syntax in clear text, although you’ll need to read topic 14 to decipher what it means.

NOTE Setting the compression level to 0 can be interesting if you need to debug your PDF file, but you shouldn’t change the compression level in a production environment, because the file size of the resulting PDFs will be bigger than files generated using the default compression level.

As you move on in this topic, you’ll find out that you’ll need to add content directly to the page on different occasions, such as when adding page numbers, or when drawing custom borders for tables. As you might imagine, you’ll need a sound understanding of the PDF reference to achieve all this.

Figure 1.7 PDF document opened in WordPad

FAQ I’ve added text using low-level methods and it doesn’t respect the margins, nor does the text wrap at the end of the line. What is wrong? That is expected behavior. When adding content like this, you need to do all the math necessary to split a String in different lines, and add it at the appropriate coordinates. Also, make sure that you don’t add the text outside the visible area of the page; this is a common mistake when adding text to an existing PDF document.

Listing 1.11 gets increasingly complex as soon as you need to add more text. Fortunately, iText comes to the rescue: you can use convenience classes and methods that significantly reduce the complexity and the lines of code needed to work with direct content.

CONVENIENCE CLASSES AND METHODS

Listing 1.12 is identical to listing 1.11 as far as steps O, C, ©, and O are concerned, but in step Q you create a Phrase object and add this to the direct content, named canvas, using the static method ColumnText.showTextAligned(). The phrase hello will be added left aligned at coordinates (36, 788) with rotation 0.

Listing 1.12 HelloWorldColumn.java

If you open the resulting PDFs from listings 1.11 and 1.12 in Adobe Reader, you’ll see that both documents look identical. If you open them in a text editor, you’ll notice that the syntax is slightly different. There’s usually more than one way to create PDF documents that look like identical twins when opened in a PDF viewer. And even if you create two identical PDF documents using the exact same code, there will be small differences between the two resulting files. That’s inherent to the PDF format.

We’re almost finished discussing the five steps in the PDF creation process. It’s time for step 5.

Closing the Document

One of the typical uses of iText is to create documents containing many pages. For example, a financial institution uses iText to create PDFs of bank statements, consisting of 100,000 or more pages. You don’t want to keep the content of that many pages in memory, and that’s why iText will write content to the OutputStream as soon as possible. If a page is full, the content stream of that page will be written to the Output-Stream; if you’re writing to a file, that content will be flushed from the memory.

CONTENT FLUSHED TO THE OUTPUTSTREAM VERSUS CONTENT KEPT IN MEMORY

If you return to figure 1.6 or 1.7, you’ll see that object 2, the page content stream of page 1, appears as the first object in the file. Other objects will be added at a higher byte position, regardless of their object number. iText has to keep certain objects in memory because there’s a chance you’ll reuse them and change them during the creation process. You’ll use this mechanism in section 5.4.2 to add the total number of pages—a number that is only known when the final page is reached—to all the previous pages.

Specific objects, such as the catalog and the info dictionary, will be added last by iText. They’re written to the OutputStream upon closing the Document. There’s also the cross-reference table, an important structure that is written immediately after the catalog and info dictionary. It contains the byte positions of the PDF objects that define the document. It’s followed by the trailer, containing information that enables an application to quickly find the start of the cross-reference table, and objects such as the info dictionary. Finally, the following byte sequence will be added, indicating that the file has been completely written:

%EOF

You don’t need to close the OutputStream you created in step C. iText will close this stream right after the end-of-file sequence.

KEEPING THE OUTPUTSTREAM OPEN

There may be occasions when you don’t want the stream to be closed automatically.

Listing 1.13 HelloZip.java

In O, you create a ZipOutputStream. It will generate a zip archive named hello.zip containing different PDF files. You use this OutputStream C to create an instance of PdfWriter, but you immediately use the setCloseStream() method to tell the writer that it shouldn’t close the stream. If you don’t do this, the ZipOutputStream will be closed D, and a java.io.IOException will be thrown O, saying “Stream closed.” You

have to wait until you’ve closed the final entry added to the zip file, before you can close the ZipOutputStream Q.

This example concludes our series of simple “Hello World” examples. You now have a solid first impression of how to use iText to create new PDF documents.

Summary

In this first introductory topic, you’ve had a brief introduction to PDF, learning what is possible in PDF and what is possible with iText.

You’ve compiled and executed a first example, generating a simple “Hello World” PDF document. Using listings 1.1 through 1.13, you’ve created 15 similar files, of which three were archived in a zip file. In doing so, you’ve gone through the five elementary steps in iText’s PDF-creation process: create a Document, get a PdfWriter instance, open the Document, add content, close the Document.

This topic contained many forward references, and some of the examples introduced functionality that was probably too complex for a first topic, but don’t worry: every line of code will be explained further on in the topic.

In the next topic, you’ll create PDFs with content that is more meaningful. I’ll introduce a simple movie database and you’ll use iText’s high-level objects to publish the content of this database in different PDF documents.