With Java PDF library jPDFText, you can obtain strings and positions from invoices and statements using the PDFText.getLinesWithPosition method. Knowing the rectangular coordinates and location of each text string allows you to do content analysis of the invoice or statement and get data values for specific fields such as invoice date, customer name, customer address, […]
Articles Tagged: extract text
Code Sample: Extract text from each page on a PDF document (in Java)
Java program that extracts the text for each page in a PDF document and writes it to a file using Qoppa’s library jPDFText. // Load the document PDFText pdfText = new PDFText ("input.pdf", null); // Loop through the pages for (int pageIx = 0; pageIx < pdfText.getPageCount(); ++pageIx) { // Get the text for […]