Java program that extracts the text for each page in a PDF document and writes it to a file using Qoppa’s library jPDFText.

// Load the document
PDFText pdfText = new PDFText ("input.pdf", null);
 
// Loop through the pages
for (int pageIx = 0; pageIx < pdfText.getPageCount(); ++pageIx)
{
 // Get the text for the page
 String pageText = pdfText.getText(pageIx);
 
 // Save the text in a file
 FileWriter output = new FileWriter ("output_" + pageIx + ".txt");
 output.write(pageText);
 output.close();
}
Tagged: