Friday, December 19, 2008

Using iText to convert TIFF to PDF and to combine multiple PDFs into one PDF

We recently had the requirement to take a Microsoft Excel file as well as one or more PDF "attachments" and 1) convert the Excel file to PDF, and 2) combine all the PDFs into a single PDF.

Unfortunately we were unable to find an open source Java solution for converting the Excel file to PDF.  If you want to extract information from the Excel file you can do so using Apache POI.  You can then generate a PDF file from this data using iText.  This wasn't an appropriate solution for us.  An alternative solution in the interim was to print the Excel file to TIFF using the Microsoft Document Image Writer and then convert the TIFF to PDF using iText.  The results look alright, not ideal, but workable.

Here are two Java methods to do it.  Let me know if you find a free Java solution to converting Excel to PDF.

import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.net.MalformedURLException;
import java.util.ArrayList;
import java.util.List;

import com.lowagie.text.Document;
import com.lowagie.text.DocumentException;
import com.lowagie.text.Image;
import com.lowagie.text.PageSize;
import com.lowagie.text.pdf.PRAcroForm;
import com.lowagie.text.pdf.PdfCopy;
import com.lowagie.text.pdf.PdfImportedPage;
import com.lowagie.text.pdf.PdfReader;
import com.lowagie.text.pdf.PdfWriter;
import com.lowagie.text.pdf.SimpleBookmark;

public class FileConversions {

    /**
     * Convert a TIFF file to a PDF.
     * 
     * @param tiffFile
     * @return
     * @throws DocumentException
     * @throws MalformedURLException
     * @throws IOException
     */
    public static byte[] convertTiffToPdf(byte[] tiffFile) 
            throws DocumentException, MalformedURLException, IOException {

        ByteArrayOutputStream outfile = new ByteArrayOutputStream();
        Document document = new Document(PageSize.A4.rotate());
        PdfWriter writer = PdfWriter.getInstance(document, outfile);
        writer.setStrictImageSequence(true);
        document.open();
        Image tiff = Image.getInstance(tiffFile);
        tiff.scaleToFit(800, 600);
        document.add(tiff);
        document.close();
        outfile.flush();
        return outfile.toByteArray();
    }

    /**
     * Combine multiple PDFs into a single PDF.
     *
     * @param pdfs
     * @param combinedPdfFile TODO
     * @throws IOException
     * @see http://itext.ugent.be/library/com/lowagie/examples/general/copystamp/Concatenate.java
     */
    public static void combinePdfFiles(List<byte[]> pdfs, File combinedPdfFile) throws Exception {

        PdfReader reader = null;
        Document document = null;
        PdfCopy  writer = null;
        ArrayList master = new ArrayList();
        int pageOffset = 0;

        for (byte[] pdf : pdfs) {
            int size = pdf.length;
            reader = new PdfReader(pdf);
            reader.consolidateNamedDestinations();
            int n = reader.getNumberOfPages();
            List bookmarks = SimpleBookmark.getBookmark(reader);
            if (bookmarks != null) {
                if (pageOffset != 0) {
                    SimpleBookmark.shiftPageNumbers(bookmarks, pageOffset, null);
                }
                master.addAll(bookmarks);
            }
            pageOffset += n;

            if (document == null) {
                // step 1: creation of a document-object
                document = new Document(reader.getPageSizeWithRotation(1));
                // step 2: we create a writer that listens to the document
                writer = new PdfCopy(document, new FileOutputStream(combinedPdfFile));
                // step 3: we open the document
                document.open();
            }
            // step 4: we add content
            PdfImportedPage page;
            for (int i = 0; i < n; ) {
                ++i;
                page = writer.getImportedPage(reader, i);
                writer.addPage(page);
            }
            PRAcroForm form = reader.getAcroForm();
            if (form != null) {
                writer.copyAcroForm(reader);
            }
        }
        if (!master.isEmpty()) {
            writer.setOutlines(master);
        }
        if (document != null) {
            document.close();
        }
    }
}

5 comments:

Anonymous said...

this is a fine piece of code.. It is very helpfull. Thanks to karl...

Anant said...

Have you looked at Jodcconverter. I know it does word docs pretty well ... conversion is done with OpenOffice running in headless mode on port 8100
Not sure if it does Excel ... I would think it would though

Ballesteros said...

is possible convert tiff color file to a black&white pdf file?

Thanks

Anusree said...

Thanks Karl. I have used your sample code to generate PDF from tif. This is very good for single page Tif file. Do you have any sample code for multipage tif file because this did not work for multipage tif file.

Anonymous said...

http://stackoverflow.com/questions/7721447/creating-pdf-from-tiff-image-using-itext