In a PDF document, it is possible to have different glyphs that map to the same Unicode character. In this case, you will see different characters when you read the PDF but if you select and extract the text, the underlying character is the same. Sometimes this is justified but most often this happens because the file was created incorrectly and the underlying unicode characters are incorrect.
This is not possible in an HTML5 file because the text that you see is the same as the underlying text. What you see is what you get…
For this reason, when converting a PDF file to HTML5 using jPDFWeb, developers can decide how to convert the text by setting the “TextFidelity” option to one of the 2 values below:
- “Display”: This will preserve the visual fidelity from the original PDF. This is the default value if not set. /li>
- “TextExtract”: This will preserve the underlying text in the original PDF.
Here is a simple code sample to change the “TextFidelity” option:
// load a PDF file PDFWeb pdfWeb = new PDFWeb(“file.pdf”, null); // create a new SVGOptions object SVGOptions options = new SVGOptions(); // change the default option to preserve underlying text fidelity options.setExtraOption(“TextFidelity”, “TextExtract”); // save first PDF to SVG pdfWeb .savePageAsSVG(0, options, “file_page1.svg”);