With Java PDF library jPDFText, you can obtain strings and positions from invoices and statements using the PDFText.getLinesWithPosition method.

Knowing the rectangular coordinates and location of each text string allows you to do content analysis of the invoice or statement and get data values for specific fields such as invoice date, customer name, customer address, invoice amount, account number, etc… in order to, for instance, save them to a database or process them in an automated workflow.

See for instance the invoice shown in the screenshot below:

Sample Invoice with dates and currency amounts

 

Here is the sample Java program that you can use to extract data and location information from this report:

public static void main (String [] args)
    {
        try
        {
            // Load the document
            PDFText pdfText = new PDFText ("C:\\test\\sample_invoice.pdf", null);
 
            // Loop through the pages
            for (int pageIx = 0; pageIx < pdfText.getPageCount(); ++pageIx)
            {
                // Echo page number
                System.out.println ("\n***** Page " + pageIx + " *****\n");
 
                // Get the words in the page and their position
                Vector lineList = pdfText.getLinesWithPositions(pageIx);
 
                // Echo each of the words in the document
                for (int i = 0; i < lineList.size(); ++i)
                {
                    // Echo the word information
                    TextPosition tp = (TextPosition)lineList.get(i);
                    System.out.println (tp.getText() + " - " + echoQuad (tp.getQuadrilateral()));
                }
            }
        }
        catch (Throwable t)
        {
            t.printStackTrace();
        }
    }
 
    private static String echoQuad (Point2D [] quadPoints)
    {
        DecimalFormat decFormat = new DecimalFormat ("0.00");
 
        StringBuffer quadString = new StringBuffer("Line Rectangle Coordinates: ");
        quadString.append ("(" + decFormat.format(quadPoints [0].getX()) + "," + decFormat.format(quadPoints [0].getY()) + ") ");
        quadString.append ("(" +decFormat.format(quadPoints [1].getX()) + "," + decFormat.format(quadPoints [1].getY()) + "), ");
        quadString.append ("(" +decFormat.format(quadPoints [2].getX()) + "," + decFormat.format(quadPoints [2].getY())+ ") ");
        quadString.append ("(" +decFormat.format(quadPoints [3].getX()) + "," + decFormat.format(quadPoints [3].getY())+ ")");
 
        return quadString.toString();
    }

Download Full Java Sample

Here is the output of this program:

***** Page 0 *****

Invoice - Line Rectangle Coordinates: (53.08,57.29) (157.62,57.29) (53.08,96.00) (157.62,96.00)
Customer Name - Line Rectangle Coordinates: (56.69,195.70) (135.66,195.70) (56.69,208.12) (135.66,208.12)
Street - Line Rectangle Coordinates: (56.69,208.90) (85.58,208.90) (56.69,221.32) (85.58,221.32)
Postcode City - Line Rectangle Coordinates: (56.69,222.10) (122.55,222.10) (56.69,234.52) (122.55,234.52)
Country - Line Rectangle Coordinates: (56.69,235.30) (96.20,235.30) (56.69,247.72) (96.20,247.72)
Invoice date: - Line Rectangle Coordinates: (56.69,270.50) (119.40,270.50) (56.69,282.92) (119.40,282.92)
Invoice number: - Line Rectangle Coordinates: (56.69,283.70) (136.92,283.70) (56.69,296.12) (136.92,296.12)
Payment due: - Line Rectangle Coordinates: (56.69,296.90) (123.65,296.90) (56.69,309.32) (123.65,309.32)
Description - Line Rectangle Coordinates: (56.69,326.60) (123.25,326.60) (56.69,339.02) (123.25,339.02)
Product Upgrades & Support - Line Rectangle Coordinates: (56.69,344.76) (196.90,344.76) (56.69,357.18) (196.90,357.18)
Total - Line Rectangle Coordinates: (56.69,362.36) (81.89,362.36) (56.69,374.78) (81.89,374.78)
Please transfer amount to: - Line Rectangle Coordinates: (56.69,410.76) (184.77,410.76) (56.69,423.18) (184.77,423.18)
Bank account name: - Line Rectangle Coordinates: (56.69,434.84) (156.97,434.84) (56.69,447.74) (156.97,447.74)
Name of Bank: - Line Rectangle Coordinates: (56.69,448.04) (129.45,448.04) (56.69,460.94) (129.45,460.94)
Bank State Branch (BSB): - Line Rectangle Coordinates: (56.69,461.24) (183.86,461.24) (56.69,474.14) (183.86,474.14)
Bank State Branch (BSB): - Line Rectangle Coordinates: (56.69,474.44) (183.86,474.44) (56.69,487.34) (183.86,487.34)
Bank State Branch (BSB): - Line Rectangle Coordinates: (56.69,487.64) (183.86,487.64) (56.69,500.54) (183.86,500.54)
Bank account number: - Line Rectangle Coordinates: (56.69,500.84) (166.75,500.84) (56.69,513.74) (166.75,513.74)
Bank SWIFT code: - Line Rectangle Coordinates: (56.69,514.04) (149.00,514.04) (56.69,526.94) (149.00,526.94)
Bank address: - Line Rectangle Coordinates: (56.69,527.24) (127.00,527.24) (56.69,540.14) (127.00,540.14)
Company Name - Line Rectangle Coordinates: (428.58,118.24) (508.05,118.24) (428.58,131.14) (508.05,131.14)
1 Main St - Line Rectangle Coordinates: (428.58,131.44) (475.05,131.44) (428.58,144.34) (475.05,144.34)
San Francisco CA 94122 - Line Rectangle Coordinates: (428.58,144.64) (550.86,144.64) (428.58,157.54) (550.86,157.54)
USA - Line Rectangle Coordinates: (428.58,157.84) (451.20,157.84) (428.58,170.74) (451.20,170.74)
www.domain.com - Line Rectangle Coordinates: (428.58,182.04) (515.37,182.04) (428.58,194.94) (515.37,194.94)
ABN 22 11 33 - Line Rectangle Coordinates: (428.58,195.24) (497.07,195.24) (428.58,208.14) (497.07,208.14)
Dec 25, 2017 - Line Rectangle Coordinates: (227.73,270.50) (289.61,270.50) (227.73,282.92) (289.61,282.92)
162222 - Line Rectangle Coordinates: (227.73,283.70) (263.22,283.70) (227.73,296.12) (263.22,296.12)
30 days after invoice date - Line Rectangle Coordinates: (227.73,296.90) (351.43,296.90) (227.73,309.32) (351.43,309.32)
From - Line Rectangle Coordinates: (227.73,326.60) (258.98,326.60) (227.73,339.02) (258.98,339.02)
Nov 26, 2016 - Line Rectangle Coordinates: (227.73,344.76) (291.98,344.76) (227.73,357.18) (291.98,357.18)
Until - Line Rectangle Coordinates: (330.11,326.60) (358.89,326.60) (330.11,339.02) (358.89,339.02)
Nov 26, 2017 - Line Rectangle Coordinates: (330.11,344.76) (393.67,344.76) (330.11,357.18) (393.67,357.18)
Amount - Line Rectangle Coordinates: (492.67,326.60) (538.54,326.60) (492.67,339.02) (538.54,339.02)
USD $590.00 - Line Rectangle Coordinates: (471.23,344.76) (538.54,344.76) (471.23,357.18) (538.54,357.18)
USD $590.00 - Line Rectangle Coordinates: (471.23,362.36) (538.54,362.36) (471.23,374.78) (538.54,374.78)
Company Name - Line Rectangle Coordinates: (227.73,434.84) (307.20,434.84) (227.73,447.74) (307.20,447.74)
Wells Fargo - Line Rectangle Coordinates: (227.73,448.04) (286.41,448.04) (227.73,460.94) (286.41,460.94)
063010 - Line Rectangle Coordinates: (227.73,461.24) (264.43,461.24) (227.73,474.14) (264.43,474.14)
063010 - Line Rectangle Coordinates: (227.73,474.44) (264.43,474.44) (227.73,487.34) (264.43,487.34)
063019 - Line Rectangle Coordinates: (227.73,487.64) (264.43,487.64) (227.73,500.54) (264.43,500.54)
13201652 - Line Rectangle Coordinates: (227.73,500.84) (276.66,500.84) (227.73,513.74) (276.66,513.74)
CTBAAU2S - Line Rectangle Coordinates: (227.73,514.04) (285.80,514.04) (227.73,526.94) (285.80,526.94)
420 Montgomery Street. San Francisco, CA 94104 - Line Rectangle Coordinates: (227.73,527.24) (474.12,527.24) (227.73,540.14) (474.12,540.14)