Apache PDFBox Split PDF Document in Java

by MemoryNotFound · February 20, 2018

The following example demonstrates how to use Apache PdfBox to split a PDF Document.

Maven Dependencies

We use Apache Maven to manage our project dependencies. Make sure the following dependencies reside on the class-path.

<dependency>
    <groupId>org.apache.pdfbox</groupId>
    <artifactId>pdfbox</artifactId>
    <version>2.0.8</version>
</dependency>

Apache PDFBox Split All Pages of PDF Document

We can split all pages of the PDF document using the Splitter class.

package com.memorynotfound.pdf.pdfbox;

import org.apache.pdfbox.multipdf.Splitter;
import org.apache.pdfbox.pdmodel.PDDocument;

import java.io.File;
import java.io.IOException;
import java.util.Iterator;
import java.util.List;

public class SplitPdf {

    public static void main(String[] args) throws Exception{

        try (PDDocument document = PDDocument.load(new File("/tmp/example.pdf"))) {

            // Instantiating Splitter class
            Splitter splitter = new Splitter();

            // splitting the pages of a PDF document
            List<PDDocument> Pages = splitter.split(document);

            // Creating an iterator
            Iterator<PDDocument> iterator = Pages.listIterator();

            // Saving each page as an individual document
            int i = 1;
            while (iterator.hasNext()) {
                PDDocument pd = iterator.next();
                pd.save("/tmp/split_" + i + ".pdf");
                i++;
            }

        } catch (IOException e){
            System.err.println("Exception while trying to read pdf document - " + e);
        }
    }

}

Output

When we run the previous application all pages of the PDF document are splitted in their own PDF document. You can see the result in the following image.

Apache PDFBox Split Specific Page of PDF Document

We can split only specific pages of the PDF document using the Splitter class by setting the setStartPage(); and setEndPage(); methods.

package com.memorynotfound.pdf.pdfbox;

import org.apache.pdfbox.multipdf.Splitter;
import org.apache.pdfbox.pdmodel.PDDocument;

import java.io.File;
import java.io.IOException;
import java.util.Iterator;
import java.util.List;

public class SplitPdf {

    public static void main(String[] args) throws Exception{

        try (PDDocument document = PDDocument.load(new File("/tmp/example.pdf"))) {

            // Instantiating Splitter class
            Splitter splitter = new Splitter();
            splitter.setStartPage(2);
            splitter.setEndPage(2);

            // splitting the pages of a PDF document
            List<PDDocument> Pages = splitter.split(document);

            // Creating an iterator
            Iterator<PDDocument> iterator = Pages.listIterator();

            // Saving each page as an individual document
            int i = 1;
            while (iterator.hasNext()) {
                PDDocument pd = iterator.next();
                pd.save("/tmp/split_" + i + ".pdf");
                i++;
            }

        } catch (IOException e){
            System.err.println("Exception while trying to read pdf document - " + e);
        }
    }

}

Output

When we run the previous application only a specific page of the PDF document is splitted in his own PDF document. You can see the result in the following image.

apache pdfbox split specific page pdf document java

References

Download

Download it – apache pdfbox split pdf document java

Apache PDFBox Split PDF Document in Java

You may also like...

Overview

Other Gson Tutorials

Apache PDFBox Split PDF Document in Java

Discover more articles

Maven Dependencies

Apache PDFBox Split All Pages of PDF Document

Output

Apache PDFBox Split Specific Page of PDF Document

Output

References

Download

You may also like...

Apache PDFBox Encrypt Decrypt PDF Document Java

Apache PDFBox Extract Embedded File from PDF Document

Create PDF Document with iText in Java

Overview

Other Gson Tutorials