As of version 2013R2, jPDFEditor, Qoppa’s Java PDF editing component, has an optional OCR function available.

OCR is also available in jPDFNotes and the steps for integration are the same as for jPDFEditor.

Follow the instructions below to add an “OCR” button to the toolbar so your users can perform OCR on PDF documents open in   Qoppa’s visual PDF component.

Please contact us regarding licensing this additional feature.

How to Activate OCR in jPDFEditor

To get started, you can download:

  • the latest jPDFEditor version from our standard download page:

https://www.qoppa.com/pdfeditor/demo/download

  • the JNI native bridge files from here:

https://www.qoppa.com/files/pdfprocess/ocr/libtessjni.zip

The JNI zip file contains the native libraries builds for Windows, Linux and Mac OS X, all in 32 and 64 bits. At runtime, these native libraries will  need to be in the machine that is running the software.If you are running in an application, you can bundle the native libraries in your installation.  If you are running in an applet, you probably want to get these files from a server on demand:  When the user chooses to use OCR, you can have the applet download the appropriate file for the OS and bitness to a local folder from your web server.

  • The OCR language files from here:

https://kbdeveloper.qoppa.com/ocr-language-download-links/

The language zip file contains language files for English, German, French, Spanish and Italian. The files inside the zip file are directly from the Tesseract project site, they are archive files for each of the languages which you will need to un-compress so that jPDFEditor can use them.You only need to have the language files for the languages that you want to support in the local machine.  You should also probably install these on demand by having the applet download the files from your server when necessary.

To activate the OCR functionality, the 2 steps are:

  1. Call OCRBridge.initialize() with the path to these directories.
  2. OCRBridge.initialize(String libraryPath, String dataPath);

    • libraryPath is the path to the folder where the native libraries are located
    • dataPath is the path to the folder where the OCR language files (uncompressed)
  3. Call PDFEditorBean.activateOCR() for a PDFEditorBean instance to enable OCR for that instance. This method will add a button to the toolbar and returns a reference to the button in case you want to add the button to a different container.

Download OCR sample code that demonstrate all this.

Additional Languages

Additional languages including non-latin and CJK languages, can be downloaded from OCR Languages Download Links.

Extract the archives and place all files for a language in the “tessdata” directory. Add entries to languages.xml to convert the language prefix in the language combo box.

Tagged: