How to skip PDF pages that have already been OCRed when recognizing text in a PDF – Knowledge Base – Qoppa Java PDF API SDK & Server Products

March 20, 2019

If you are OCRing a document where some pages have already been OCRed, you can skip these pages:

TessJNI ocr = new TessJNI();
for (int count = 0; count < pdf.getPageCount(); ++count)
{
    PDFPage page = pdf.getPage(count);
    // if the page already has invisible text, skip it
    if (!page.containsInvisibleText())
    {
        String pageOCR = ocr.performOCR("eng", page, 300);
        page.insert_hOCR(pageOCR, true);
    }
}

Tagged: OCR invisible text