Multiple glyphs mapping to same unicode when converting PDF to HTML5

In a PDF document, it is possible to have different glyphs that map to the same Unicode character. In this case, you will see different characters when you read the PDF but if you select and extract the text, the underlying character is the same. Sometimes this is justified but most often this happens because the file was created incorrectly and the underlying unicode characters are incorrect.

This is not possible in an HTML5 file because the text that you see is the same as the underlying text. What you see is what you get…

For this reason, when converting a PDF file to HTML5 using jPDFWeb, developers can decide how to convert the text by setting the “TextFidelity” option to one of the 2 values below:

“Display”: This will preserve the visual fidelity from the original PDF. This is the default value if not set. /li>
“TextExtract”: This will preserve the underlying text in the original PDF.

Here is a simple code sample to change the “TextFidelity” option:

// load a PDF file
PDFWeb pdfWeb = new PDFWeb(“file.pdf”, null);
// create a new SVGOptions object
SVGOptions options = new SVGOptions();
// change the default option to preserve underlying text fidelity
options.setExtraOption(“TextFidelity”, “TextExtract”);
// save first PDF to SVG
pdfWeb .savePageAsSVG(0, options, “file_page1.svg”);

See our PDF technology in action!

Privacy Policy

Links to Qoppa’s Main Website

Contact Support

Follow Us

Suggested Articles

Related Articles

See our PDF technology in action!

Links to Qoppa’s Main Website

Contact Support

Follow Us