Java Sample Code to Recognize (OCR) and Add Text to a PDF Document

Here is a simple small Java program that uses Qoppa’s PDF library jPDFProcess and the Tesseract libraries to recognize text in a PDF and add it as invisible text on each PDF page: // Load a PDF that contains scanned pages needing to be OCRed PDFDocument pdfDoc = new PDFDocument("C:/test/test.pdf", null); // initialize the OCR […]

Read More

OCR Languages Download Links

OCR Language Download Links Afrikaans Albanian – shqip Arabic – العربية Azerbaijani – azərbaycan Basque – euskara Belarusian – беларуская Bengali – বাংলা Bulgarian – български Catalan – català Cherokee – ᏣᎳᎩ ᎦᏬᏂᎯᏍᏗ Chinese (Simplified) – 中文简体中文 Chinese (Traditional) – 中文繁體 Croatian – hrvatski Czech – čeština Danish – dansk Estonian – eesti Galician – galego Greek – Ελληνικά Hebrew – עברית Hindi […]

Read More

Creating Searchable PDF from Image Files

Q: Can we convert images files into searchable PDF documents, by performing OCR, using Qoppa’s Java PDF library? A: Yes, using jPDProcess, you can do that. 1. Convert Images to PDF Pages The first step is to create a PDF from the images: // create a new PDF document PDFDocument pdfDoc = new PDFDocument(); // […]

Read More

PDF OCR With Multiple Languages

To call OCR with multiple languages, for instance English and French, call: com.qoppa.ocr.TessJNI.performOCR("eng+fra", myPage, 200); com.qoppa.ocr.TessJNI.performOCR("eng+fra", myPage, 200);

Read More

New Languages Supported in OCR

v2015R2 added OCR support for non-Latin and CJK languages. New Latin languages have also been added to the available list of languages. Here is a complete list of newly added OCR languages: New OCR Languages: Afrikaans Albanian – shqip Arabic – العربية Azerbaijani – azərbaycan Basque – euskara Belarusian – беларуская Bengali – বাংলা Bulgarian […]

Read More